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Preface 


It is with great pleasure that we present the proceedings of Asiacrypt 2014 in two 
volumes of Lecture Notes in Computer Science published by Springer. The year 
2014 marked the 20th edition of the International Conference on Theory and 
Application of Cryptology and Information Security held annually in Asia by 
the International Association for Cryptologic Research (IACR). The conference 
was sponsored by the IACR and was jointly organized by the following con- 
sortium of universities and government departments of the Republic of China 
(Taiwan): National Sun Yat-sen University; Academia Sinica; Ministry of Sci- 
ence and Technology; Ministry of Education; and Ministry of Economic Affairs. 
The conference was held in Kaohsiung, Republic of China (Taiwan), during 
December 7-11, 2014. 

An international Program Committee (PC) consisting of 48 scientists was 
formed approximately one year earlier with the objective of determining the 
scientific content of the conference. As for previous editions, Asiacrypt 2014 also 
stimulated great interest among the scientific community of cryptologists. A total 
of 255 technical papers were submitted for possible presentations approximately 
six months prior to the conference. Authors of the submitted papers are spread 
all over the world. Each PC member could submit at most two co-authored 
papers or at most one single- authored paper, and the PC co-chairs did not 
submit any paper. All the submissions were screened by the PC members and 55 
papers were finally selected for presentation at the conference. These proceedings 
contain the revised versions of the papers that were selected. The revisions were 
not checked and the responsibility of the papers rest with the authors and not 
the PC members. 

The selection of papers for presentations was made through a double-blind 
review process. Each paper was assigned four reviewers and submissions by PC 
members were assigned five reviewers. Apart from the PC members, the selection 
process was assisted by a total of 397 external reviewers. The total number of 
reviews for all the papers was more than 1,000. In addition to the reviews, the 
selection process involved an extensive discussion phase. This phase allowed PC 
members to express opinion on all the submissions. The final selection of 55 
papers was the result of this extensive and rigorous selection procedure. 

The decision of the best paper award was based on a vote among the PC 
members, and it was conferred upon the paper “Solving LPN Using Covering 
Codes” authored by Qian Guo, Thomas Johansson, and Carl Londahl. In addi- 
tion to the best paper, three other papers were recommended for solicitations 
by the Editor-in-Chief of the Journal of Cryptology to submit expanded ver- 
sions to the journal. These papers are “Secret-Sharing for NP” authored by 
Ilan Komargodski, Moni Naor, and Eylon Yogev; “Mersenne Factorization Fac- 
tory” authored by Thorsten Kleinjung, Joppe W. Bos, and Arjen K. Lenstra; and 
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“Jacobian Coordinates on Genus 2 Curves” authored by Huseyin Hisil and Craig 
Costello. 

In addition to the regular presentations, the conference featured two invited 
talks. The invited speakers were decided through an extensive multi-round dis- 
cussion among the PC members. This resulted in very interesting talks on two 
different aspects of the subject. Kennth G. Paterson spoke on “Big Bias Hunt- 
ing in Amazonia: Large-Scale Computation and Exploitation of RC4 Biases,” a 
topic of importance to practical cryptography, while Helaine Leggat spoke on 
“The Legal Infrastructure Around Information Security in Asia,” which had an 
appeal to a wide audience. 

Along with the regular presentations and the invited talks, a rump session was 
organized. This session contained short presentations on latest research results, 
announcements of future events, and other topics of interest to the audience. 

Many people contributed to Asiacrypt 2014. We would like to thank the au- 
thors of all papers for submitting their research works to the conference. Thanks 
are due to the PC members for their enthusiastic and continued participation for 
over a year in different aspects of selecting the technical program. The selection 
of the papers was made possible by the timely reviews from external reviewers, 
and thanks are due to them. A list of external reviewers is provided in these 
proceedings. We have tried to ensure that the list is complete. Any omission is 
inadvertent and if there is an omission, we apologize to that person. 

Special thanks are due to D. J. Guan, the general chair of the conference, for 
working closely with us and ensuring that the PC co-chairs were insulated from 
the organizational work. This work was carried out by the Organizing Committee 
and they deserve thanks from all the participants for the wonderful experience. 
We thank Daniel J. Bernstein and Tanja Lange for expertly organizing and 
chairing the rump session. 

We thank Shai Halevi for developing the IACR conference management soft- 
ware, which was used for the whole process of submission, reviewing, discussions, 
and preparing these proceedings. We thank Josh Benaloh, our IACR liaison, 
and San Ling, Asiacrypt Steering Committee Representative, for guidance and 
advice on several issues. Springer published the volumes and made these avail- 
able before the conference. We thank Alfred Hofmann, Anna Kramer, Christine 
Reiss and their team for the professional and efficient handling of the production 
process. 


December 2014 


Palash Sarkar 
Tetsu Iwata 
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Abstract. We present a new algorithm for solving the LPN problem. 
The algorithm has a similar form as some previous methods, but includes 
a new key step that makes use of approximations of random words to 
a nearest codeword in a linear code. It outperforms previous methods 
for many parameter choices. In particular, we can now solve instances 
suggested for 80-bit security in cryptographic schemes like HB variants, 
LPN-C and Lapin, in less than 2 80 operations. 


1 Introduction 

In recent years of modern cryptography, much effort has been devoted to finding 
efficient and secure low-cost cryptographic primitives targeting applications in 
very constrained hardware environments (such as RFID tags and low-power 
devices). Many proposals rely on the hardness assumption of Learning Parity 
with Noise (LPN), a fundamental problem in learning theory, which recently 
has also gained a lot of attention within the cryptographic society. The LPN 
problem is well-studied and it is intimately related to the problem of decoding 
random linear codes, which is one of the most important problems in coding 
theory. Being a supposedly hard probleirQ, the LPN problem is a good candidate 
for post-quantum cryptography, where other classically hard problems such as 
factoring and the discrete log problem fall short. The inherent properties of LPN 
also makes it ideal for lightweight cryptography. 

The first time the LPN problem was employed in a cryptographic construction 
was in the Hopper-Blum (HB) identification protocol [lj 7 ]. HB is a minimalistic 
protocol that is secure in a passive attack model. Aiming to secure the HB scheme 
also in an active attack model, Juels and Weis mi, and Katz and Shin m pro- 
posed a modified scheme. The modified scheme, which was given the name HB + , 
extends HB with one extra round. It was later shown by Gilbert et al. m that 
the HB + protocol is vulnerable to active attacks, i.e. man-in-the-middle attacks , 

* Supported in part by the National Natural Science Foundations of China (Grants No. 

61170208 ) and Shanghai Key Program of Basic Research (Grant No. 12JC1401400). 

** Supported by the Swedish Research Council (Grants No. 621-2012-4259). 

1 LPN with adversarial error is ATP-hard. 

P. Sarkar and T. Iwata (Eds.): ASIACRYPT 2014, PART I, LNCS 8873, pp. l420l 2014. 

(c) International Association for Cryptologic Research 2014 
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where the adversary is allowed to intercept and attack an ongoing authentication 
session to learn the secret. Gilbert et al. m subsequently proposed a variant 
of the Hopper-Blum protocol called HB#. Apart from repairing the protocol, 
the constructors of HB^ introduced a more efficient key representation using a 
variant of LPN called Toeplitz-LPN. 

In [13J , Gilbert et al. proposed a way to use LPN in encryption of messages, 
which resulted in the cryptosystem LPN-C. Kiltz et al. [22] and Dodis et al. 0 
showed how to construct message authentication codes (MACs) using LPN. The 
existence of MACs allows one to construct identification schemes that are prov- 
ably secure against active attacks. The most recent contribution to LPN-based 
constructions is a two-round identification protocol called Lapin, proposed by 
Heyse et al. m, and an LPN-based encryption scheme called Helen, proposed 
by Due and Vaudenay m i- The Lapin protocol is based on an LPN variant 
called Ring-LPN, where the samples are elements of a polynomial ring. 

The two major threats against LPN-based cryptographic constructions are 
generic algorithms that decode random linear codes (information set decoding 
(ISD)) and variants of the BKW algorithm, originally proposed by Blum et 
al. [3j. Being the asymptotically most efficient^ approach, the BKW algorithm 
employs an iterated collision procedure on the queries. In each iteration, colliding 
entries sum together to produce a new entry with smaller dependency on the 
information bits but with an increased noise level. Once the dependency from 
sufficiently many information bits are removed, the remaining are exhausted 
to find the secret. Although the collision procedure is the main reason for the 
efficiency of the BKW algorithm, it leads to a requirement of an immense amount 
of queries compared to ISD. Notably, for some cases, e.g., when the noise is very 
low, ISD yields the most efficient attack. 

Levieil and Fouque [26] proposed to use Fast Walsh-Hadamard Transform in 
the BKW algorithm when searching for the secret. In an unpublished paper, 
Kirchner [23] suggested to transform the problem into systematic form, where 
each information (key) bit then appears as an observed symbol, pertubated by 
noise. This requires the adversary to only exhaust the biased noise variables 
rather than the key bits. When the error rate is low, the noise variable search 
space is very small and this technique decreases the attack complexity. Building 
on the work by Kirchner [23], Bernstein and Lange [5] showed that the ring 
structure of Ring-LPN can be exploited in matrix inversion, further reducing 
the complexity of attacks on for example Lapin. None of the known algorithms 
manage to break the 80 bit security of Lapin. Nor do they break the parameters 
proposed in |26j, which were suggested as design parameters of LPN-C [13] for 
80-bit security. 

In this paper, we propose a new algorithm for solving the LPN problem based 
on (2315] . We employ a new technique that we call subspace distinguishing, which 
exploits coding theory to decrease the dimension of the secret. The trade-off is 
a small increase in the sample noise. Our novel algorithm performs favorably in 
comparison to » state-of-the-art « algorithms and we manage to break previously 
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Table 1. Comparison of different algorithms for solving LPN with parameters (512, 1/8) 


Algorithm 

Complexity (log 2 ) 



Queries 

Time 

Memory 

Levied- Fouque [26 

75.7 

87.5 


84.8 

Bernstein-Lange [5 

68.6 

85.7 


77.6 

New algorithm 

66.3 

79.9 


75.3 


unbroken parameters of HB variants, Lapin and LPN-C. As an example, we 
attack the common (512, l/ 8 )-instance of LPN and break its 80-bit security 
barrier. A comparision of complexity of different algorithm^! is shown in Table [TJ 
The organization of the paper is as follows. In Section O we give some pre- 
liminaries and introduce the LPN problem in detail. Moreover, in Section [3] we 
give a short description of the BKW algorithm. We briefly describe the general 
idea of our new attack in Section [4] and more formally in Section [5] In Section [ 6 j 
we analyze its complexity. The results when the algorithm is applied on various 
LPN-based cryptosystems are given in Section [7] and in Section [8j we describe 
some aspects of the covering-coding technique. Section [9] concludes the paper. 


2 The LPN Problem 

We will now give a more thorough description of the LPN problem. Let Ber^ be 
the Bernoulli distribution and let X ~ Be be a random variable with alphabet 
A = {0, 1}. Then, Pr [X = 1] = 77 and Pr [X = 0] = 1 — Pr [X = 1] = 1 — 77 . The 
bias e of X is given from Pr [X = 0] = 1/2 (1 + e). Let k be a security parameter 
and let x be a binary vector of length k. 

Definition 1 (LPN oracle). An LPN oracle LTlpn for an unkown vector x E 
{0, l} k with rj E (0, returns pairs of the form 

(g A {0, l} k , (x,g) + e) , 

where e <— Ber^. Here, (x, g) denotes the scalar product of vectors x and g. 

We also write (x, g) as x • g T , where g T is the transpose of the row vector g. 
We receive a number n of noisy versions of scalar products of x from the oracle 
TIlpn, and our task is to recover x. 

3 The Bernstein-Lange algorithm is originally proposed for Ring-LPN, and by a slight 
modification [5], one can apply it to the LPN instances as well. It shares the be- 
ginning steps (i.e., the steps of Gaussian elimination and the collision procedure) 
with the new algorithm, so for a fair comparison, we use the same implementation 
of these steps when computing their complexity. 
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Problem 1 (LPN). Given an LPN oracle JIlpn; the (k,r])-LPN problem con- 
sists of finding the vector x. An algorithm App^{t,n,S) using time at most t 
with at most n oracles queries solves (A:, 77 ) -LPN if 


Pr 


*Alpn(£, n, S) = x : x A {0, l} k 


> 5 . 


Let y be a vector of length n and let yi = (x, g^). For known random vec- 
tors gi, g 2 , . . . , g nj we can easily reconstruct an unknown x from y using lin- 
ear algebra. In the LPN problem, however, we receive instead noisy versions of 
Difi = 1, 2, . . . , n. Writing the noise in position i as e^, i = 1, 2, . . . , n we obtain 
Zi = i/i + ei = (x, gi) + e$. In matrix form, the same is written as z = xG + e, 
where z = \z\ Z 2 • • • z n ] , and the matrix G is formed as G = [g^ g£ • • • g^] . 
This shows that the LPN problem is simply a decoding problem, where G is a 
random kxn generator matrix, x is the information vector and z is the received 
vector after transmission of a codeword on the binary symmetric channel with 
error probability 77 . 


2.1 Piling- up Lemma 

We recall the piling-up lemma, which is frequently used in analysis of the LPN 
problem. 

Lemma 1 (Piling- up lemma). Let Xi, X 2 , .. .X n be independent binary ran- 
dom variables where each Pr [Xi = 0] = |(1 + ef), for 1 < i < n. Then, 

if n 

Pr [Xi + X 2 + • • • + X n = 0] = - I 1 + II e * 

Z V i = 1 



3 The BKW Algorithm 


The BKW algorithm is due to Blum, Kalai and Wasserman [3]. In the spirit of 
generalized birthday algorithms, their approach uses an iterative sort-and- match 
procedure on the columns of the generator matrix G, which iteratively reduces 
the dimension of G. 

Initially, one searches for all combinations of two columns in G that add to 
zero in the last b entries. Assume that one finds two columns gJ , g? such that 


gii + gi 2 = [* * 


* 0 0 ••• 0 ], 

b symbols 


( 2 ) 

where * means any value. Then a new vector g^ ; = gq + g i 2 is formed. An 
“observed symbol” is also formed, corresponding to this new column by forming 


,( 2 ) - 


— Zi, + Zi. 


If 


Vi 


( 2 ) _ 


= ^x, gi 2 ^, then z[ 2 ^ = + e^\ where now ef^ = 


+ ei 2 . It can be verified that Pr 


3 ( 2 ) 


= 0 


1/2(1 + e 2 ). 
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There are two approaches to realize the above merging procedure. One, raised 
by Blum et al. [3], called LF1 type by Levieil and Fouque [26] . and later adopted 
by Bernstein and Lange [5], is choosing one sample in each partition with the 
same last b entries, and then adding it to the remaining samples in the same 
partition. Thus, the number of samples reduces by about 2 b after this operation. 
The other method is a heuristic called LF2 in [26], which computes any pair 
with the same last b entries. It produces more samples at the cost of increased 
dependency, thereby gaining more efficiency in practice but losing rigorous anal- 
ysis in theory. We will use the LF1 setting throughout the remaining part of the 
paper. 

Put all such new columns in a matrix G 2 , 


G 2 



(2) t 

§2 


„ (2)t 

&n-2 b 


If n is the number of columns in G, then the number of columns in G 2 will 
be n — 2 b . Note that the last b entries of every column in G 2 are all zero. In 
connection to this matrix, the vector of observed symbols is 


Z2 


J2) J2) (2) 

Z 1 z 2 " ' Z n -2 b ’ 


where Pr 



= 1/2(1 + e 2 ), for 1 < i < n — 2 b . 


We now iterate the same, picking one column and then adding it to another 
suited column in G^ giving a sum with an additional b entries being zero, forming 
the columns of G^|. Repeating the same procedure an additional t — 2 times 
will reduce the number of unknown variables to k — bt in the remaining problem. 

For each iteration the noise level is squared. By the piling-up lemma we have 


that 


Pr 


2* 


J2 e i = ° 

3 = 1 


1 

2 



)■ 


Hence, the bias decreases quickly to low levels. The remaining unknown key 
variables are guessed and for each guess we check whether the bias is present or 
not. The procedure is summarized in Algorithm 1. 


4 Essential Idea 

In this section we try to give a very basic description of the idea used to give a 
new and more efficient algorithm for solving the LPN problem. A more detailed 
analysis will be provided in later sections, and a graphical interpretation of the 
key step is given in Appendix [Al 

Assume that we have an initial LPN problem described by G = [g^ g 2 • • • g^] 
and z = xG + e, where z = [z± Z 2 • • • £ n ] , where Zi = yi + e* = (x, g^) + e*. 

As previously shown in [23] and 0, we may through Gaussian elimination 
transform G into systematic form. Assume that the first k columns are linearly 
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Algorithm 1 . BKW Algorithm 


Input: Matrix G with k rows and n columns and received vector z, algorithm 
parameters b , t 

1 Put the received word as a first row in the matrix, Gi 

2 for i — 1 to t do 

3 For Gi, partition the columns by the last b • i bits; 

4 Form pairs of columns from each partition and form Gi+i; 

5 for x e {0, l} k ~ bt do 

6 Find the vector [lxO] such that [lxO] Gt+i has minimal weight; 



independent and forms the matrix D. With a change of variables x = xD -1 
we get an equivalent problem description with G = [I g£ +1 g£ +2 ’ ' ' Sn] • We 
compute 

z = z + [zi,z 2 , . . . , Zk] G = [ 0 , 4 + 1 , 4+2, • • • ,z n ] . 

In this situation, one may start performing a number of BKW steps on 
columns k + 1 to n, reducing the dimension k of the problem to something 
smaller. This will result in a new problem instance where noise in each posi- 
tion is larger, except for the first systematic positions. We may write the prob- 
lem after performing t BKW steps in the form G' = [I gi T g 2 T * * * g^] and 
z' = [0, z[, z ' 2 , . . . , where now G' has dimension k' xm with k' = k — bt and m 

is the number of columns remaining after the BKW step. We have zl = x'G' + e', 
Pr \x\ = 0] = 1/2(1 + e) and Pr [x' • g' T = Zi\ = 1/2(1 + e 2 *). 

Now we will explain the basics of the new idea proposed in the paper. In a 
problem instance as above, we may look at the random variables y[ = x' • g' T . 
The bits in x' are mostly zero but a few are set to one. Let us assume that c 
bits are set to one. Furthermore, x' is fixed for all i. We usually assume that 
g- is generated according to a uniform distribution. However, assume that every 
column g- would be biased, i.e., every bit in a column position is zero with 
probability 1/2(1 + e'). Then we observe that the variables y[ will be biased, as 

Vi = ( x '-g i) = 'i\kj, 

3 = 1 

where fci, /^ 2 , . . . k c are the bit positions where x' has value one (here [x] y denotes 
bit y of vector x). In fact, variables y[ will have bias ( e') c . 

So how do we get the columns to be biased in the general case? We could 
simply hope for some of them to be biased, but if we need to use a larger 
number of columns, the bias would have to be small, giving a high complexity 
for an algorithm solving the problem. We propose instead to use a covering code 
to achieve something similar to what is described above. Vectors g' are of length 
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fc', so we consider a code of length k' and some dimension l. Let us assume that 
the generator matrix of this code is denoted F. For each vector g', we now find 
the codeword in the code spanned by F that is closest (in Hamming sense) to 
g'. Assume that this codeword is denoted c Then we can write 

g i = c i +e', 

where e' is a vector with biased bits. It remains to examine exactly how biased 
the bits in e' will be, but assume for the moment that the bias is e' . Going back 
to our previous expressions we can write 

^ = (x',g')=x'-(c i+ e') T 

and since c i = u^F for some u^, we can write 

y'=x' F T X + x'-e' T . 

We may introduce x" = x'F T as a length l vector of unknown bits (linear 
combinations of bits from x') and again 

2/< = x"- u^ + x'-ef. 

Since we have Pr [y[ = z^] = 1/2(1 + e 2 *), we get 

Pr[x".< = ^] = i(l + e 2 t (e') c ), 

where e' is the bias determined by the expected distance between g' and the 
closest codeword in the code we are using, and c is the number of positions in x' 
set to one. The last step in the new algorithm now selects about m — l/(e 2 e /c ) 2 
samples z[, z ' 2 , . . . , , z f m and for each guess of the 2 l possible values of x", we 
compute how many times x" -u/ = z[ when z = l,2,...,m. As this step is similar 
to a correlation attack scenario, we know that it can be efficiently computed 
using Fast Walsh-Hadamard Transform. After recovering x", it is an easy task 
to recover remaining unknown bits of x'. 


4.1 An Example Using Dimension k = 160 

In order to illustrate the ideas and convince the reader that the proposed al- 
gorithm can be more efficient than previously known methods, we consider an 
example. We assume an LPN instance of dimension k = 160, where we allow 
at most 2 24 received samples and we allow at most around 2 24 vectors of length 
160 to be stored in memory. Furthermore, the error probability is r] = 0.1. 

For this particular case, we propose the following algorithm. The first step is 
to compute the systematic form, G = [I g£ +1 g £ + 2 ’ ' ’ §n] an d 


z = z+[zx z 2 ... Z k \G= [0 Z k + 1 4+2 ■■■ Z n ] . 
Here G has dimension 160 and z has length at most 2 24 . 
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In the second step we perform t = 4 steps of BKW (using the LF1 approach), 
the first step removing 22 bits and the remaining three each removing 21 bits. 
This results in G' = [I g 2 T • • • g^ T ] and zl = [0 z[ z' 2 . . . z'^ , where now 
G' has dimension 75 x m and m is about 3-2 21 . We have z' = x'G', Pr [x[ = 0] = 
1/2(1 + e), where e = 0.8 and Pr [x' • g[ T = Zj\ — 1/2(1 + e 16 ). So the resulting 
problem has dimension 75 and the bias is e 2 = (0.8) 16 . 

In the third step we then select a suitable code of length 75. In this example 
we choose a block code which is a direct sum of 25 [3, 1, 3] repetition codefl i.e., 
the dimension is 25. We map every vector g- to the nearest codeword by simply 
selecting chunks of three consecutive bits and replace them by either 000 or 111. 
With probability 3/4 we will change one position and with probability 1/4 we will 
not have to change any position. In total we expect to change (3/4 • 1 + 1/4 • 0) • 25 
positions. The expected weight of the length 75 vector e- is 75/4, so the expected 
bias is e' = 1/2. As Pr [x- = 1] =0.1, the expected number of nonzero positions 
in x' is 7.5. Assuming we have only c = 6 nonzero positions, we get 

Pr [x" • < = z'l = 1^1 + 0.8 16 Q) 6 j = 1(1 + 2 -11 ' 15 ). 

In the last step we then run through 2 25 values of x" and for each of them 
we compute how often x" • u/ = z- for i = 1, . . . , 3 • 2 21 . Again since we use Fast 
Walsh-Hadamard Transform, the cost of this step is not much more than 2 25 
operations. The probability of having no more than 6 ones in x' is about 0.37, 
so we need to repeat the whole process a few times. 

In comparison with other algorithms, the best approach we can find is the 
Kirchner, Bernstein, Lange approach |23I5| . where one can do up to 5 BKW 
steps. Removing 21 bits in each step leaves 55 remaining bits. Using Fast Walsh- 
Hadamard Transform with 0.8 -64 = 2 20,6 samples, we can include another 21 
bits in this step, but there are still 34 remaining variables that needs to be 
guessed. 

Overall, the simple algorithm sketched above is outperforming the best pre- 
vious algorithm using optimal parameter valued 


Simulation. We have verified in simulation that the proposed algorithm works 
in practice. We use a rate R = 1/3 concatenated repetition code and query the 
oracle for 2 24 samples. Simple pruning of the samples with too large distance from 
the codeword was used to approximate the behaviour of an optimal distinguishes 

4 In the sequel, we denote this code construction as concatenated repetition code. 
For this [75, 25, 3] linear code, the covering radius is 25, but we could see from this 
example that what matters is the average weight of the error vector, which is much 
smaller than 25. 

5 Adopting the same method to implement their overlapping steps, for the (160, 1/10) 
LPN instance, the Bernstein-Lange algorithm and the new algorithm cost 2 35 ' 70 
and 2 33 ' 83 bit operations, respectively. Thus, the latter offers an improvement with 
a factor roughly 4 to solve this small-scale instance. 
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Algorithm 2. New attacking algorithm 


Input: Matrix G with k rows and n columns, received length n vector z and 
algorithm parameters t,b,k" ,l,wo,c 


1 repeat 

2 Pick random column permutation 7 r; 

3 Perform Gaussian elimination on 7t(G) resulting in Go = [I|Lo]; 

4 for i = 1 to t do 

5 Partition the columns of L*_i by the last b ■ i bits; 

6 Denote the set of columns in partition s by 

7 Pick a vector a i 3 £ C s \ 

8 for (a £ C s ) and (a / a^s) do 

9 |_ Li £- [Li | (a + a* s )]; 


10 

n 

12 

13 

14 

15 

16 

17 


Pick a [k",l\ linear code with good covering property; 

Partition the columns of L t by the middle non-all-zero k" bits and 
group them by their nearest codewords; 

Set k\ — k — ab — k " ; 
for X 2 £ {0, l} kl with wt(x.' 2 ) < wo do 
Update the observed samples; 
for y £ {0, l} 1 do 

Use Fast Walsh- Hadamard Transform to compute the 
numbers of Is and Os observed respectively; 

Perform hypothesis testing whose threshold is defined as a 
|_ function of c; 


18 until acceptable hypothesis is found 


The average execution time is ^ 1.86 seconds on an Apple iMac 3.06 GHz Intel 
Core 2 Duo with 4 GB ram running OS X 10.9 (13A603). 

5 Algorithm Description 

Having introduced the key idea in a simplistic manner, we now formalize it by 
stating a new five-step LPN solving algorithm (see Algorithm^ in detail. Its first 
three steps combine several well-known techniques on this problem, i.e., chang- 
ing the distribution of secret vector [23] , sorting and merging to make the length 
of samples shorter [3], and partial secret guessing [5], together. The efficiency 
improvement comes from a novel idea introduced in the last two subsections — if 
we employ a linear covering code and rearrange samples according to their near- 
est codewords, then the columns in the matrix subtracting their corresponding 
codewords lead to sparse vectors desired in the distinguishing process. We later 
propose a new distinguishing technique — subspace hypothesis testing, to remove 
the influence of the codeword part using Fast Walsh-Hadamard Transform. The 
algorithm consists of five steps, each described in separate subsections. 


10 


Q. Guo, T. Johansson, and C. Londahl 


5.1 Gaussian Elimination 

Recall that our LPN problem is given by z = xG + e, where z and G are 
known. We can apply an arbitrary column permutation 7 r without changing 
the problem (but we change the error locations). A transformed problem is 
7 r(z) = x 7 t(G) + 7 r(e). This means that we can repeat the algorithm many times 
using different permutations. 

Continuing, we multiply by a suitable k x k matrix D to bring the matrix 
G to a systematic form, G = DG. The problem remains the same, except that 
the unknowns are now given by the vector x = xD -1 . This is just a change 
of variables. As a second step, we also add the codeword [z± Z 2 • • • Zk] G to 
our known vector z, resulting in a received vector starting with k zero entries. 
Altogether, this corresponds to the change x = xD -1 + [z± Z 2 • • • Zk\- 

Our initial problem has been transformed and the problem is now written as 

z = [0 z k+ 1 z k +2 ■■■ Z n ] = xG + e, (1) 

where now G is in systematic form. Note that these transformations do not affect 
the noise level. We still have a single noise variable added in every position. 

Schoolbook implementation of the above Gaussian elimination procedure re- 
quires about nk 2 / 2 bit-operations; we propose however to reduce its complexity 
by using a more sophisticated space-time trade-off technique. We store interme- 
diate results in tables, and then derive the final result by adding several items 
in the tables together. The detailed description is as follows. 

For a fixed s, divide the matrix D in a=\k/s] parts, i.e., D= [Di, D 2 , . . . , D a ] , 
where is a sub-matrix with s columns (except possibly the last matrix D a ). 
Then store all possible values of D^x T for x E in tables indexed by i, where 
1 < i < a. For a vector g = [gi, g2? • • • > ga] > the transformed vector is 

Dg T = D lg ^ + D 2g 2 + • • • + D a g^, 


where D^g^ can be read directly from the table. 

The cost of constructing the tables is about 0( 2 s ), which can be negligible 
if memory in the BKW step is much larger. Furthermore, for each column, the 
transformation costs no more than k • a bit operations; so, this step requires 

C\ = (n — k) • ka < nka 
bit operations in total if 2 s is much smaller than n. 


5.2 Collision Procedure 

This next step contains the BKW part. The input to this step is z and G. 

We write G = [I Lo] and process only the matrix Lo. As the length of Lo is 
typically much larger than the systematic part of G, this is roughly no restriction 
at all. We then use the a sort-and-match technique as in the BKW algorithm, 
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operating on the matrix Lo- This process will give us a sequence of matrices 
denoted Lo, Li, L 2 , . . . , L t . 

Let us denote the number of columns of L* by r(i), with r(0) = n—k. Adopting 
the LF1 type technique, every step operating on columns will reduce the number 
of samples by 2 6 , yielding that m = r(t) = r(0) — t2 b . Apart from the process 
of creating the L i matrices, we need to update the received vector in a similar 
fashion. A simple way is to put z as a first row in the representation of G. 

This procedure will end with a matrix [I L t ] , where will have all tb last 
entries in each column all zero. By discarding the last tb rows we have a given 
matrix of dimension k — tb that can be written as G 7 = [I L t ] , and we have 
a corresponding received vector z 7 = [0 z[ z' 2 • • • 2 . The first k' = k — tb 
positions are only affected by a single noise variable, so we can write 

[0, z'] = x'G + [ei e 2 • • • e k > e x e 2 • • • e m ] , (2) 

for some unknown x 7 vector, where \T i \< 2 t e b an d contains the 

positions that have been added up to form the ( k 7 + i)th column of G 7 . By the 
piling-up lemma, the bias for increases to e 2 . 

We denote the complexity of this step C 2 , where 

t 

C 2 = + 1 — ib)(n — i2 h ) « (k + 1 )tn. 

i= 1 


5.3 Partial Secret Guessing Procedure 

The previous procedure outputs G 7 with dimension k' = k—tb and m = n—k—t2 b 
columns. We removed the bottom tb bits of x to form the length k' vector x 7 , 
with z 7 = x 7 G 7 + e. 

We now divide x 7 into two parts: x 7 = [x 7 x X 2 ] , where x.[ is of length k" . In 
this step, we simply guess all vectors X2 G ~ k such that wt(x 2) < wo for 
some wo and update the observed vector z 7 accordingly. This transforms the 
problem to that of attacking a new smaller LPN problem of dimension k" with 
the same number of samples. Firstly, note that this will only work if wt(x 2 ) < 
and we denote this probability by P(wo, k' — k"). Secondly, we need to be able 
to distinguish a correct guess from incorrect ones and this is the task of the 
remaining steps. The complexity of this step is 


Wo 

C 3 = m'^2 


i = 0 



i. 


5.4 Covering-Coding Method 

In this step, we use a [fc 77 , l] linear code C with covering radius dc to group the 
columns. That is, we rewrite 

g i = c i + e'i, 
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where Ci is the nearest codeword in C, and wt(e' i ) < dc . The employed linear 
code is characterized by a systematic generator matrix F = [I F'] Zxfe „; we thus 
obtain a corresponding parity-check matrix H = [F /T I \(k"-i)xk"' 

There are several ways to select a code. One way of realizing the above group- 
ing idea is by a table-based syndrome decoding technique. The procedure is as 
follows: 1) We construct a constant query time table containing 2 k ~ l items, in 
each of which stores the syndrome and its corresponding minimum weight error 
vector. 2) If the syndrome Hg' T is computed, we then find its corresponding 
error vector e- by checking in the table; adding them together yields the nearest 
codeword q. 

The remaining task is to calculate the syndrome efficiently. We, according to 
the first / bits, sort the vectors g', where 0 < i < m, and group them into 2 l 
partitions denoted by Vj for 1 < j < 2 l . Starting from the partition Vi whose 
first l bits are all zero, we can derive the syndrome by reading its last k" — l 
bits without any additional computational cost. If we know one syndrome in Vj, 
then we can compute another syndrome in the same partition within 2(k" — l ) 
bit operations, and another in a different partition whose first /-bit vector has 
Hamming distance 1 from that of Vj within 3 (k" — l ) bit operations. Therefore, 
the complexity of this step is 

C 4 = (k" -l)(2m + 2 l ). 

Notice that the selected linear code determines the syndrome table, which can 
be pre-computed within complexity 0(k"2 k ~ l ). The optimal parameter sug- 
gests that this cost is acceptable compared with the total attacking complexity. 

The expected distance to the nearest codeword determines the bias e' in 
e-. This plays important roles in the later hypothesis testing step: if we rearrange 
the columns e' as a matrix, then it is sparse; therefore, we can view the ith value 
in one column as a random variable Ri distributed according to Ber^_, where d 

is the expected distance. We can bound it by the covering radiufl Moreover, if 
the bias is large enough, then it is reasonable to consider Ri, for 1 < i < H, as 
independent variables. 


5.5 Subspace Hypothesis Testing 

Group the samples (g', z') in sets L(ci) according to their nearest codewords 
and define the function /l(c^) as 

m *)= E (-!b- 

(g',2:')eL(cO 

The employed systematic linear code C describes a bijection between the linear 
space F 2 and the set of all codewords in F 2 , and moreover, due to its systematic 

6 In the sequel, we replace the covering radius by the sphere-covering bound to estimate 
the expected distance d , i.e., d is the smallest integer, s.t. ~ l . We 

give more explanation in Section [8] 
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feature, the corresponding information vector appears explicitly in their first l 
bits. We can thus define a new function 


g( u) = h(ci), 


such that u represents the first l bits of c, and exhausts all the points in F^. 
The Walsh transform of g is defined as 

G(y)= $>( u)(-l)< y ’ u >. 

ueF^ 


Here we exhaust all candidates of y G F2 by computing the Walsh transform. 

The following lemma illustrates the reason why we can perform hypothesis 
testing on the subspace F^. 

Lemma 2. There exits a unique vector y G F2 s.t., 


(y,u) = (x',c i) . 


Proof. As c i = uF, we obtain 

(x 7 , Cj) = x , F T u T = (x , F T , u) . 

Thus, we construct the vector y = x'F T that fulfills the requirement. On the 
other hand, the uniqueness is obvious. 

Given the candidate y, G( y) is the difference between the number of predicted 
0 and the number of predicted 1 for the bit z[ + (x', cf). If y is the correct guess, 
then it is distributed according to Ber 1 .^ w y where e' = 1 — and w is the 
weight of x'; otherwise, it is considered random. Thus, the best candidate yo is 
the one that maximizes the absolute value of G( y), i.e. yo = argmax yGi ^ |G(y)|, 

and we need approximately l/(e 2t+1 • ( e ') 2w ) samples to distinguish these two 
cases. Note that false positives are quickly detected in an additional step and 
this does not significantly increase complexity. 

Since the weight w is unknown, we assume that w < c and then query for 
samples. If the assumption is valid, we can distinguish the two distributions 
correctly; otherwise, we obtain a false positive which can be recognized without 
much cost, and then choose another permutation to run the algorithm again. 
The procedure will continue until we find the secret vector x. 

We use the Fast Walsh-Hadamard Transform technique to accelerate the dis- 
tinguishing step. As the hypothesis testing runs for every guess of X2, the overall 
complexity of this step is 


G5 


w 0 

* 2 ‘£ 
i = 0 
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6 Analysis 

In the previous section we already indicated the complexity of each step. We now 
put it together in a single complexity estimate. We first formulate the formula 
for the possibility of having at most w errors in m positions P(w,m) as follows, 

w 

p( W ,m) = j2( 

i = 0 

Therefore, the success probability in one iteration is P(wo,k' — k")P(c,k"). In 
each iteration, the complexity accumulates step by step, hence revealing the 
following theorem. 

Theorem 1 (The complexity of Algorithm [ 2 ]). Let n be the number of 
samples required and a, £, 6, wq, c, /, k" be algorithm parameters. For the LPN in- 
stance with parameter ( k , rf), the number of bit operations required for a successful 
run of the new attack is equal to 2*f( /c ’ n ’ a 4Awo,c,z,/c ,77)^ w f iere n ? a ? ^ ^ c ? 

k ff ,rj) is a functioi^\ defined as follows, 



f(k,n,a,t,b,w 0 ,c,l,k",r]) = 

log 2 ^ ank + b 2 h ^—~ — ^ ^ — - — ((fc + 1)2 6 + nb) + (k + 1 )tn 

+{k" - l)( 2 (n - t 2 h ) + 2‘) + i2‘ £ (*')+<" “ < 2<> ) £ (^)>) 

-log, (£(l-,)‘-v(^)) -log, (*")) (3) 

under the condition that 

n — t 2 b > l/(e 2 * +1 • (e , ) 2c ), ( 4 ) 

where e = 1 — 2??, e 7 = 1 — and d is the smallest integer, s.t., J 2 i=o ) > 
2 k "~ l . 

Proof The complexity in one iteration is C1+C2+C3+C4+C5, and the expected 
number of iterations is the inverse of P(wq, k\)P(c, k 77 ); the overall complexity, 
therefore, is C*, where 

_ Ci + C2 + C3 + C4 + C5 
P(whM)P(c, k") ' 

Substituting the detailed formulas into the above expression will end the proof. 
The condition (| 4 ]) ensures that we have enough samples to determine the right 
guess with high probability. □ 


7 The symbol k± denotes k — tb — k" for notational simplicity. 
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7 Results 

We now present numerical results of the new algorithm attacking three key 
LPN instances, as shown in Table [2] All aiming for achieving 80-bit security, 
the first one is with parameter (512,1/8), widely accepted in various LPN- 
based cryptosystems (e.g., HB + [TS], HB# [12], LPN-C [13]) after the suggestion 
from Levied and Fouque [26]; the second one is with increased length (532, 1/8), 
adopted as the parameter of the irreducible Ring-LPN instance employed in 
Lapin jT6] ; and the last one is a new design parametei@ we recommend to use in 
the future. The attacking details on different protocols will be given later. We 
note that the new algorithm has significance not only on the above applications 
but also on some LPN-based cryptosystems without explicit parameter settings 
(e.g., 0221 )- 


Table 2. The complexity for solving different LPN instances 


LPN instance 



Parameters 



log 2 n 

log 2 C* 


t 

a 

b 

l 

k" 

Wo 

c 

(512, 1/8) 

6 

9 

63 

64 

124 

2 

16 

66.3 

79.92 

(532, 1/8) 

6 

9 

65 

66 

130 

2 

17 

68.0 

81.82 

(592, 1/8) 

6 

10 

70 

64 

137 

3 

18 

72.7 

88.07 


7.1 HB+ 

In [26], Levieil and Fouque proposed an active attack on HB + by choosing the 
random vector a from the reader to be 0. To achieve 80-bit security, they sug- 
gested to adjust the lengths of secret keys to 80 and 512, respectively, instead of 
being both 224. Its security is based on the assumption that the LPN instance 
with parameter (512, 1/8) can resist attacks in 2 80 bit operations. But we break 
it in 2 79,9 bit operations, thereby yielding an active attack on 80-bit security of 
HB + authentication protocol straightforwardly. 

7.2 LPN-C and HB# 

Using similar structures, Gilbert et al. proposed two different cryptosystems, one 
for authentication (HB#) and the other for encryption (LPN-C). By setting 
the random vector from the reader and the message vector to be both 0 , we 
obtain an active attack on HB# authentication protocol and a chosen-plaintext- 
attack on LPN-C, respectively. As their protocols consist of both secure version 
(random-HB^ and LPN-C) and efficient version (HB# and Toeplitz LPN-C), 
we need to analyze separately. 

8 This instance requires 2 82 ' 3 bits memory using the new algorithm, and could with- 
stand all existing attacks on the security level of 2 80 bit operations. 
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Using Toeplitz Matrices. Toeplitz matrix is a matrix in which each ascend- 
ing diagonal from left to right is a constant. Thus, when employing a Toeplitz 
matrix as the secret, if we attack its first column successively, then only one 
bit in its second column is unknown. So the problem is transformed to that of 
solving a new LPN instance with parameter (1, 1/8). We then deduce the third 
column, the fourth column, and so forth. The typical parameter settings of the 
number of the columns (denoted by m) are 441 for HB#, and 80 (or 160) for 
Toeplitz LPN-C. In either case, the cost for determining the vectors except for 
the first column is bounded by 2 40 , negligible compared with that of attacking 
one (512,1/8) LPN instance. Therefore, we break the 80-bit security of these 
» efficient « versions that use Toeplitz matrices. 


Random Matrix Case. If the secret matrix is chosen totally at random, then 
there is no simple connection between different columns to exploit. One strategy 
is to attack column by column, thereby deriving an algorithm whose complexity 
is that of attacking a (512, 1/8) LPN instance multiplied by the number of the 
columns. That is, if m = 441, then the overall complexity is about 2 79-9 x 441 « 
2 88 - 7 . We may slightly improve the attack by exploiting that the different columns 
share the same random vector in each round. 


7.3 Lapin with an Irreducible Polynomial 

In [16], Heyse et al. use a (532,1/8) Ring-LPN instance with an irreducible 
polynomial to achieve 80-bit security. We show here that this parameter setting 
is not secure enough for Lapin to thwart attacks on the level of 2 80 . Although the 
new attack on a (532, 1/8) LPN instance requires 2 81 - 8 bit operations, larger than 
2 80 , there are two key issues to consider: 1) the Ring-LPN problem is believed 
to be not harder than the standard LPN problem^]; 2) we perform BKW steps 
using LF1 setting in the new algorithm, but may obtain a more efficient attack 
in practice when adopting the LF2 heuristic, whose effectiveness has been stated 
and proven in the implementation part of [26]. We suggest to increase the size 
of the employed irreducible polynomial in Lapin for 80-bit security. 

8 More on the Covering-Coding Method 

We in this section describe more aspects of the covering-coding technique, thus 
emphasizing the most novel and essential step in the new algorithm. 


Sphere- Covering Bound. We use sphere-covering bound, for two reasons, to 
estimate the bias e' contributed by the new technique. Firstly, there is a well- 
known conjecture [7] in coding theory, i.e., the covering density approaches 1 

9 For the instance in Lapin using a quotient ring modulo the irreducible polynomial 
x 532 + x + 1, it is possible to optimize the procedure for inverting a ring element, 
thereby resulting in a more efficient attack than the generic one. 
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asymptotically if the code length goes to infinity. Thus, it is sensible to assume 
for a »good« code, when the code length k" is relatively large. Secondly, we 
could see from the previous example that the key feature desired is a linear code 
with low average error weights, which is smaller than its covering radius. From 
this perspective, the covering bound brings us a good estimation. 

By concatenating five [23,12] Golay codes, we construct a [115,60] linear 
code 10 ! with covering radius 15. Its expected weight of error vector is quite close 
to the sphere-covering bound for this parameter (with gap only 1). We believe 
in the existence of linear codes with length around 125, rate approximately 1/2 
and average error weight that reaches the sphere-covering bound. For explicit 
code construction, see [15] for details. 

Using Soft Information. The weight of the error vector e' is different for 
different values of i, causing the confidence level to vary on different samples. 
However, the inherent assumption when using Fast Walsh-Hadamard Transform 
is a constant confidence level over all samples; thus, Fast Walsh-Hadamard Trans- 
form is not an optimal distinguishing method. For optimal distinguishing, soft 
information methods such as likelihood ratio tests are required. We show how 
to fully exploit soft distinguishing in the longer version of the paper [15]. 

Attacking Public-Key Cryptography. We know various decodable cover- 
ing codes that could be employed in the new algorithm, e.g., rate about 1/2 
linear codes that are table-based syndrome decodable, concatenated codes built 
on Hamming codes, Golay codes and repetition codes, etc.. For the aimed cryp- 
tographic schemes in this paper, i.e., HB variants, LPN-C, and Lapin with an 
irreducible polynomial, the first three are efficient; but in the realm of public- 
key cryptography (e.g., schemes proposed by Alekhnovich pQ, Damgard and 
Park 0, Due and Vaudenay [TO]), the situation alters. For these systems, their 
security is based on LPN instances with huge secret length (tens of thousands) 
and extremely low error probability (less than half a percent), so due to the 
competitive average weight of the error vector shown by the previous exam- 
ple in Section 14.11 the concatenation of repetition codes with much lower rate 
seems more applicable — by low-rate codes, we remove more bits when using the 
covering-coding method. 


Alternative Collision Procedure. Although the covering-coding method is 
employed only once in the new algorithm, we could derive numerous variants, 
and among them, one may find a more efficient attack. For example, we could 
replace one or two steps in the later stage of the collision procedure by adding 
two vectors decoded to the same codeword together. This alternative technique is 
similar to that invented by Lamberger et al. in |24l25j for finding near-collisions 
of hash function. By this procedure, we could eliminate more bits in one step 

10 Using this code, we stand at the margin of breaking the 80-bit security of (512, 1/8) 
LPN instances, with time complexity only 2 80 ' 5 and query complexity 2 66 ' 2 . 
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at the cost of increasing the error rate; this is a trade-off, and the concrete 
parameter setting should be analyzed more thoroughly later. 

9 Conclusions 

In this paper we have described a new algorithm for solving the LPN problem 
that employs an approximation technique using covering codes together with a 
subspace hypothesis testing technique to determine the value of linear combina- 
tions of the secret bits. Complexity estimates show that the algorithm beats all 
the previous approaches, and in particular, we can present academic attacks on 
instances of LPN that has been suggested in different cryptographic primitives. 

The new technique has only been described in a rather simplistic manner, 
due to space limitations. There are a few obvious improvements, one being the 
use of soft decoding techniques and another one being the use of more powerful 
constructions of good codes. There are also various modified versions that need 
to be further investigated. One such idea is to use the new technique inside a 
BKW step, thereby removing more bits in each step at the expense of introducing 
another contribution to the bias. An interesting open problem is whether these 
ideas can improve the asymptotic behavior of the BKW algorithm. 
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A Illustrating the Procedure 

In this section, we give an intuitive illustration of subspace hypothesis test per- 
formed as follows, 


Rewrite g i as codeword c i = u'F and discrepancy e' 



Secret x Query matrix 


We can separate the discrepancy e- from uF, which yields 



Finally, we note that x / 1 F T E F2, where l < k" . A simple transformation yields 



Since wu (e') < re, the contribution from (x^e') is very small. 
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Abstract. In this paper, we present a new algebraic attack against some 
special cases of Wild McEliece Incognito, a generalization of the origi- 
nal McEliece cryptosystem. This attack does not threaten the original 
McEliece cryptosystem. We prove that recovering the secret key for such 
schemes is equivalent to solving a system of polynomial equations whose 
solutions have the structure of a usual vector space. Consequently, to 
recover a basis of this vector space, we can greatly reduce the number 
of variables in the corresponding algebraic system. From these solutions, 
we can then deduce the basis of a GRS code. Finally, the last step of 
the cryptanalysis of those schemes corresponds to attacking a McEliece 
scheme instantiated with particular GRS codes (with a polynomial re- 
lation between the support and the multipliers) which can be done in 
polynomial-time thanks to a variant of the Sidelnikov- Shestakov attack. 
For Wild McEliece & Incognito, we also show that solving the corre- 
sponding algebraic system is notably easier in the case of a non-prime 
base field ¥ q . To support our theoretical results, we have been able to 
practically break several parameters defined over a non-prime base field 
q G {9, 16, 25, 27, 32}, t ^ 6, extension degrees m G {2, 3}, security level 
up to 2 129 against information set decoding in few minutes or hours. 

Keywords: Public-key cryptography, McEliece cryptosystem, algebraic 
cryptanalysis. 


1 Introduction 

Algebraic cryptanalysis is a general attack technique which reduces the security 
of a cryptographic primitive to the difficulty of solving a non-linear system of 
equations. Although the efficiency of general polynomial system solvers such as 
Grobner bases, SAT solvers . . ., is constantly progressing such algorithms all 
face the intrinsic hardness of solving polynomial equations. As a consequence, 
the success of an algebraic attack relies crucially in the ability to find the best 
modelling in term of algebraic equations. 

P. Sarkar and T. Iwata (Eds.): ASIACRYPT 2014, PART I, LNCS 8873, pp. 21 [Til 2014. 
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In [14115] . Faugere, Otmani, Perret and Tillich (FOPT) show - in particular - 
that the key-recovery of McEliece [20] can be reduced to the solving of a system 
of non-linear equations. This key-recovery system can be greatly simplified for so- 
called compact variants of McEliece, e.g. |4!‘2ir2l1fil2Hrn . leading to an efficient 
attack against various compact schemes mm- However, it is not clear whether 
the attack of mm could be efficient against non-compact variants of McEliece, 
the bottleneck being the huge number of variables and the high degree of the 
equations involved in the algebraic modelling. 

We present a novel algebraic modelling that applies to the original McEliece 
system and to generalizations such as Wild McEliece [6] and Wild McEliece 
Incognito [8]. Note, however, that the resulting attack works only in some spe- 
cial cases, and in particular does not work for the original McEliece system. 
Wild McEliece uses Wild Goppa codes , that is Goppa codes over F q ,q ^ 2, with 
a Goppa polynomial of the form r q_1 (T being an univariate polynomial of low 
degree). This form of the Goppa polynomial, generalizing the form used in the 
original McEliece system for q = 2, allows to increase the number of errors that 
can be added to a message (in comparison to a random Goppa polynomial of 
the same degree). In [8], Bernstein, Lange, and Peters generalized this idea by 
using Goppa polynomials of the form /T 9-1 , with / another univariate poly- 
nomial. We shall call such Goppa codes Masked Wild Goppa codes. Like the 
authors of |8], we refer to this version as Wild McEliece Incognito. All in all, 
Wild McEliece/ Wild McEliece Incognito allow the users to select parameters 
with a resistance to all known attacks, so in particular to the algebraic attack of 
mm , similar to that of binary Goppa codes but with much smaller keys. The 
security of Wild McEliece defined over quadratic extension has been recently 
investigated in m ], where the authors presented a polynomial time attack on 
the key when t = deg(T) > 1. 

1.1 Our Contributions 

We present a completely new algebraic attack dedicated to Wild McEliece and 
Wild McEliece Incognito. To do so, we show that the key-recovery for such 
schemes is equivalent to finding the basis of a vector-space which is hidden in 
the zero-set of an algebraic system. To our knowledge, this is a new computa- 
tional problem that never appeared in algebraic cryptanalysis before. Compared 
to the algebraic attack proposed in m for McEliece, our modelling intrinsically 
involves less variables. Informally, the multiplicity of the Goppa polynomial im- 
plies that the solutions of the algebraic system considered here have a structure 
of vector space. When the base field is ¥ q with q > 2, this simplifies its res- 
olution. For instance, for a Wild McEliece Incognito scheme with parameters 
q = 32, m = 2, n = 864, t = 2, deg(f) = 36), we end up with an algebraic system 
having only 9 variables m would require to consider algebraic equations with 
1060 variables in the same situation). On a very high level, our attack proceeds 
in two main steps. 

1. Polynomial System Solving. We have to solve a non-linear system of 
equations whose zero-set forms, unexpectedly, a vector space of some known 
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dimension d. Consequently, we can reduce the number of variables by fixing 
d variables in the initial and repeat several times the solving step to recover a 
basis of the vector space solution. This is the most computationally difficult 
part of the attack. 

2. Linear Algebra to Recover the Secret Key. The second phase is the 
treatment of the solutions obtained at the first step so as to obtain a private 
description which allows to decode the public-key as efficiently as the private 
key. It involves computing intersections of vector spaces, solving linear sys- 
tems, and polynomial interpolation. Thus, this part can be done efficiently, 
i.e. in polynomial time. 

We detail below the main ingredients of our attack. 

An Algebraic Modelling with a Vector Space Structure on the Zero 
Set. Let Gpub = (gij)o^i^n-i G F^ xn be the public matrix of a Wild McEliece 

Incognito scheme. We denote by m its extension degree, and set t = deg(T). Our 
attack considers the system 

W,,a( Z)= |J 

uEV a 

with V a = {1,2,... ,p a — 1} U {p a ,p a+1 , . . . , q} being a subset of {1, ... , q}. 

As a comparison, the modelling of Faugere, Otmani, Perret and Tillich m 
will necessarily introduce variables X = (Xq, . . . , X n _i), Y = (To, • • • , Y n _ i) and 
W = (Wo, . . . , W n - 1 ) for all the support and multipliers (that is, the vectors 
y = T(x) -1 and w = /(x) -1 ). In [14] , the system is as follows: 

{ n— 1 

j = 0 

In our context, [14] would induce a system containing monomials of the forms 
Yf Y Xf x and even Yf Y Xf x Wf w (for some (xJyJw)- Here, we use a single 
vector of variables Z = (Zo, . . . , Z n _i) and write very simple homogeneous 
equations. The secret-key x, y and w will be recovered from Z, but in a sec- 
ond step. The main advantage of this approach (Theorem [2]) is that the so- 
lutions of Wg ?a (Z) have a very unexpected property for a non-linear system: 
they form a vector space. This allows to reduce the number of “free” unknowns 
in yPg,a(Z) by the dimension of the solutions. For example, we end up with 
a system containing only 9 variables for an Incognito scheme with parameters 
g = 32,m = 2,n = 864, t = 2, deg(f) = 36). The algebraic description of Goppa 
codes proposed in El would require to consider algebraic equations with 1060 
variables for the same parameters. 

To be more precise, the vector space underlying the solutions of m is closely 
related to Generalized Reed-Solomon (GRS) codes. 

Definition 1 (Generalized Reed-Solomon codes). Let x= (xo, . . . , x n -i) G 
(F q m) n where all x^s are distinct and y = (yo, • • • ,y n -i) G (F* m ) . The 


n— 1 

E 

j = o 


9ijZ? = 0 


0 ^ i ^ k — 1 


(i) 
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Generalized Reed-Solomon code of dimension t, denoted by GRSt(x, y), is de- 
fined as follows 


GRSt(x, y) = f {(yoQ(x 0 ), . . . ,y n -iQ(x n -i)) \ Q G F g m[z],deg(Q) < t - l}. 


We shall call x the support of the code, and y the multipliers. 

Theorem [2] shows that the solutions of W q , a { Z) contain a vector-space which 
is generated by sums of codewords of Generalized Reed-Solomon (GRS) codes 
GRSt(x^y^) (where (x, y) is a key equivalent to the secret key). In Section lT2l 
we explain more precisely how we can take advantage of this special structure 
for solving m and recover a basis of the vector subspace. 

A Method to Isolate a GRS Code From a Sum of GRS. From a basis 
of this sum of GRS, we want to recover the basis of the code GRSt(x, y). We 
refer to this phase as the disentanglement. We expose our solution in Section 
[4j which relies on a well-chosen intersection of codes. It is rigorously proved 
in characteristic 2 (Proposition [6]). For other characteristics, we launched more 
than 100, 000 experiments and observed that Proposition [6] still held in all cases. 

A Sidelnikov- Shestakov- Like Algorithm Recovering the Goppa Poly- 
nomial. Given a basis of a Generalized Reed-Solomon code GRSt(x, y), the 
Sidelnikov-Shestakov attack [26] consists in recovering the secret pair of vectors 
(x, y). It is well-known that the Sidelnikov-Shestakov attack works in polynomial- 
time. In our case, we have to address a slight variant of this problem. There is a 
polynomial relation r(z) linking x and y which is part of the private key. In Sec- 
tion [T2j we provide an adaptation of [26] to obtain a key (x',y / ,T / ) equivalent 
to the secret key, also in polynomial time. We are unaware of such an algorithm 
published so far. 

A Weakness of Codes Defined Over Non-prime Base Fields. Indepen- 
dently of our algebraic attack, we prove a general result about Goppa codes 
defined over ¥ q (with q = p s , p prime and s > 0) and whose polynomials have a 
factor r(z) with multiplicity q. We show in Section [5] that the coordinate- vectors 
over of the codewords of such a public code are codewords of a Wild Goppa 
code, defined over F p , with same secret support and Goppa polynomial r(z) p 
(Theorem [8]). In other words, this construction gives access, from the public 
key, to a new code implying the same private elements. As a consequence, using 
non-prime base fields reveals more information on the secret key than expected 
by the designers. Any key-recovery attack can benefit from it. This is then an 
intrinsic weakness of Goppa codes defined over non-prime base fields. In our 
context, this property provides additional linear equations between the variables 
Zf s of the system O. We can reduce the number of variables from ( p s — 1 )mt 
to (p — 1 )mst essential variables, and make the codes defined over fields ¥ q with 
q = p s notably weaker ( Corollary [TO]) . 
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1.2 Impact of Our Work 

In order to evaluate the efficiency of our attack, we considered various parameters 
for which |6| said that strength is “unclear” and that an attack would not be a 
“surprise” but for which no actual attack was known. 

Information Set Decoding (ISD) is a generic decoding technique which allows 
message-recovery. This technique has been intensively studied since 1988 (e.g. 
11310151312121 ) and remains the reference to choose secure parameters in code- 
based cryptography. The latest results from [24] have been used to generate the 
parameters for Wild McEliece and Wild McEliece Incognito. 

In [6j Table 7.1] numerous keys are presented which illustrate the key size re- 
duction when the size of the field q grows. Another consequence of increasing q 
is pointed out by the authors of [6] : the low number of irreducible polynomials in 
F q m [z] entails a possible vulnerability against the SSA structural attack ( |18I25| ). 
Although the designers provide a protection (using non full-support codes) such 
that [18] is completely infeasible today, they warn that further progress in [T8l 
may jeopardize the parameters with q > 9 and thus estimate that those parame- 
ters have unclear security. Our experiments reveal that, in the case of non-prime 
base fields, it is already possible to recover the secret key in some minutes with 
our attack using off-the-shelf tools (Magma [9] V2.19-1). 

Getting around the alleged vulnerability against SSA was the main motivation 
for proposing Incognito: in [8] Table 5.1], they propose parameters considered 
fully secure, as all ISD-complexities are above 2 128 and numbers of possible 
Goppa polynomials greater than 2 256 . It turns out that, in the case of non- 
prime base fields, the extra-shield introduced in Incognito is not a protection 
against our attack. We can practically break the recommended parameters for 
q G {16, 27, 32}. However, we could not solve (in less than two days) the algebraic 
systems involved for extension degrees m ^ 4 or t ^ 7, and for codes over 
F p , p prime. So, it does not threaten the original McEliece cryptosystem. To 
conclude, we highlight that Theorem [2] is valid for all Goppa codes whose Goppa 
polynomial has multiplicities and should be then taken into account by designers 
in the future. Figure \l\ provides a diagram which recapitulates all the steps 
performed to solve the system and recover the secret key. 


2 Coding Theory Background 


Let F q be a finite field of q = p s elements (p prime, and s > 0). To define 
conveniently the various kinds of codes we will deal with, we introduce the 
following Vandermonde-like matrices: 



/ yo 

• • Vn-l \ 

Vt(x, y) = f 

yoxo • 

Vu-l^n-l 


\yo4~ 1 • 

y„-i A-\/ 


where (x = . . . ,x n -i),y ~ (y 0 , . . .,y n - 1 )) e F x F£ m . 


( 2 ) 
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Fig. 1. Overview of the attack 


With suitable x and y, the rows of such matrices V t (x,y) define Generalized 
Reed-Solomon (GRS) codes (Definition [l]). Alternant and Goppa codes can be 
viewed as the restriction of duals of GRS codes to the base field ¥ q . 

Definition 2 (Alternant/Goppa Codes). Let x = (xo, . . . , x n ^%) G (F^m) 72 
where all xfis are distinct and y G (F* m ) n . The alternant code of order t is 

defined as *4t(x, y) d = {cgFJ | V t (x, y)c T = 0} .As for GRS codes , x is 
the support; and y the multipliers. Let g(z ) G F g m[z] be of degree t satisfying 
g(xi) ^ 0 for all i, 0 ^ i ^ n — 1. We define the Goppa code over F g associated 

to g(z) as the code ^ q {x^g{z)) d = At(x, y), with y = g(x) _1 . The dimension 
k of £f g (x, g(z)) satisfies k ^ n — tm. The polynomial g(z) is called the Goppa 
polynomial; and m is the extension degree. Equivalently , & q (x, g(z)') can be 
defined as: 


&q(x,g(z)) 


def 


n — 1 


(c 0 , . . . , c„_i) e F” I = 0 mod g(z) 


i = 0 


Goppa codes naturally inherit a decoding algorithm that corrects up to | errors. 
This bound can be improved to correct more errors by using Wild Goppa codes, 
introduced by Bernstein, Lange, and Peters in [6]. We also recall the version 
of Wild Goppa code used in Wild McEliece Incognito [8]. We call such special 
version of Wild Goppa codes: Masked Wild Goppa codes. 

Definition 3 (Wild Goppa/Masked Wild Goppa). Let x be an n-tuple 
(xo, . . • , x n -i) of distinct elements of¥ q m. Let T{z) G F^mjz] (resp. f(z) G 
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F q m [z\) be a squarefree polynomial of degree t (resp. u) satisfying r(xi) ^ 0 (resp. 
f{xi) 7 ^ 0) for all i, 0 < i < n — 1. A Wild Goppa code is a Goppa code whose 
Goppa polynomial is of the form g(z) = r(z) q ~ 1 . A Masked Wild Goppa code 
is a Wild Goppa code whose Goppa polynomial is such that g(z) = f{z)r{z) q ~ 1 . 

The reason for using those Goppa polynomials lies in the following result. 

Theorem 1. \6l8\f Let the notations be as in Definition [31 It holds that 

%{*,f(z)r q ~ l {z)) =%(x,f(z)r q (z)). ( 3 ) 

Thus , the code (x, f(z)T q (z )) has dimension ^ n — m {{q — l)t + u). 

This is a generalization of a well-known property for q = 2. The advantage 
of Wild Goppa codes (i.e. / = 1 ) compared to standard Goppa codes is that 
[qt/2\ errors can be decoded efficiently (instead of | _(q — l)t/2\) for the same 
code dimension (n — (q — l)mt in most cases). In fact, we can decode up to 
[qt/ 2J + 2 using list decoding. This increases the difficulty of the syndrome 
decoding problem. Hence, for a given level of security, codes with smaller keys 
can be used (for details, see [ 6 | Section 7] and [ 8 j Section 5]). 


3 An Algebraic Modelling with a Vector Space Structure 
on the Zero Set 


The core idea of our attack is to construct, thanks to the public matrix, an 
algebraic system whose solution set S has a very surprising structure (Definition 
[4]). It appears that S includes the union of several vector spaces. The vector 
spaces correspond in fact to sums of GRS codes (Definition [l]) which have almost 
the same support x and multiplier vector y as the public-key of the attacked Wild 
McEliece Incognito scheme (Theorem [3j). These vectors give a key-equivalent to 
the secret-key. 


3.1 Description of the New Modelling 


We consider the following algebraic equations: 

Definition 4. Let q = p s (p prime and s ^ 0). Let = (gij)o^i^n-i be a 

O^j^k-l 

generator matrix of a masked Wild Goppa code ffpub = ^(x, f (z)^ -1 (z)) . For 
an integer a, 0 < a ^ s, we define the system W (?5 a(Z) as follows : 


n — 1 


W,,a( Z)= U < V V = 0 I O^iO-l 
ueVa l j = 0 


(4) 


with V a = {1,2, . . .,p a - 1} U {p a ,p a+1 , 
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The parameter a in V a determines the exponents considered for the Zfi s in the 
system 0 . For a = s, we consider all the powers Zf where u ranges in {1, . . . , q}. 
Removing some exponents leads to a system with fewer equations and may seem 
counter-intuitive at first sight (the more equations, the better it is for solving a 
polynomial system). However, the situation is different here due to the specific 
structure of the solutions of >Vq ? a(Z), described in the following theorem. 

Theorem 2. Let the notations be as in Definition Let y = T(x) _1 ; t = 
deg(T) and C a = Uo<r^s-i-a * • • > (p ~ l)p r } U {p s_a }. The solutions S 

ofWq^afiZ) contain the union of m vector spaces which are sums of GRS codes: 



with GRSt(x^, y £ ) qC denoting all the elements of GRSt(x^, y e ) with coordinates 
raised to the power q e , with 0 e ^ m — 1. 

Remark 1. When all the powers {1, . . . , q} are considered in the system, that is 
a = s, then C a is reduced to {1} and the solution set is a union of GRS codes. If 
a < 8, the solution set is a bit more complex, but it has the great advantage of 
having a larger dimension; allowing then to solve the system more efficiently. 
We will formalize this in Section 13.21 

Note that we state in Theorem [2] that we know a subset of the solutions. In 
practice, as the system is highly over defined, we always observed that this subset 
was all the solutions. 

Proof The full proof of this result is postponed in Section IW3l We just give the 
global idea of the proof. The goal is to show the elements of GRS^ (x^, 

are solutions of W q , a { Z). We can assume that e = 0 w.lo.g. 

Let z = (zi,..., z n ) £ GRSt(x^y^). We write the coordinates of z 

as Zj = y^Qt{ x ^j)i where the Qfi s are polynomials of degree ^ t — 1 of 

F q m [z] . We have to prove that 


n— 1 

y, QtjzJ = o for u € Vi UP 2 , where Vi = {1,2, ...,p a - 1), V 2 = {p a , ■ ■ ■ ,P S }- 
j = 0 

The idea is to develop zf = (J2ee£ a yjQ#( x j)) U w ^h Newton multinomial. 
The development is performed slightly differently whether u £ Vi or u £ V 2 
(see Appendix IA.3[) . In both cases, we end up with a result of the form zf = 
^2u x ,u y a u x ,u y yJ Vx ] x , so that our sum writes: 


n— 1 


y 91 , 3 $ 
j = 0 


( n - 1 

y .'A, 

j'—' 0 


(5) 


Then, we apply the next lemma (proved in Appendix IA.2[) . 
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Lemma 3. Let G pu b be a generator matrix of a masked Wild Goppa code ffpub = 
£f g (x, f(z)T q ~ 1 (z)),y = r(x)- 1 , w = /(x ) -1 and t = deg(T(z). The values of 
x, y, and w satisfy the following set of equations for any value of u x ,u y ,u,b 
verifying the conditions 0 ^ u y ^ q, 0 ^ u x ^ u y t — 1,0 ^ u ^ deg (/) - 1,5 G 
{ 0 , 1 } and ( 6 , iz y ) 7 ^ ( 0 , 0 ): 



We set 6 = 0 and obtain that gijyJ v = 0 for ( u y ,u x ) such that 1 

u y ^ t and 0 ^ u x ^ u y t — 1. Thus to conclude that X]j=o 9hj z f = 0, we check 
that all the couples ( u x ,u y ) appearing in the sum © satisfy those conditions. 


3.2 Recovering a Basis of the Vector Subspace 

We now explain more precisely how to use the particular structure of the solution 
set for solving the non-linear system When looking for a vector in a subspace 
of Wqm of dimension d, then you can safely fix d coordinates arbitrarily and 
complete the n — d so as to obtain a vector of this subspace. This corresponds 
to computing intersections of your subspace with d hyperplanes. With this idea, 
we deduce the following corollary of Theorem [2] 

Corollary 4. Letffpub = £f g (x, f (z)T q ~ 1 (z)) be a masked Wild Goppa code. Let 
t = deg (T), >V 9j a(Z) ; and C a be as defined in Theorem [H Then, we can fix t x 
7 fC a variables to arbitrary values in W g , a (Z). The system obtained has m so- 
lutions (counted without multiplicities) , one for each sum GRSt(x^, y £ ) q . 

In the rest of this article, we set A a?t = t x #£ a . Our purpose is to find a 
basis of one of the vector spaces GRSt(x^, y £ ) q . To do so, we pick A a?t 

independent solutions of W g , a (Z) by fixing the variables Zq, Z \, . . . , Z\ a 1 in 
Wq, a ,t( Z) accordingly. Namely, for 0 ^ ^ A a ,t — 1, we pick one solution vW 

among the m solutions of the system 


w,,„( Z) U {Zi = 1, Zj = 0 I 0 < j ± i < A a,t ~ 1}- 

Thanks to Theorem [3] and Definition [lj we know that those solutions can be 
written as follows, for Qi^ G F q m [z] of degree lower than t and 0 ^ e* ^ m — 1: 


& = 0, 


, 1 ,.. 


■> 0 ’ Z y q Zfi 


i,t 


(C 




iec a 


GRS t (xV)^- (6) 

te£ a 


After A a? t resolutions of Wg, a (Z), the solutions are not necessarily a basis 
of one of the vector spaces GRSt(x^, y £ ) q because the Frobenius expo- 

nents need not be identical for all v^’s. We explain in the next paragraph why 
this is not an issue in practice. 

Simplication: Frobenius Alignment. Let {vW}o^A at _i be as defined in ©• 
We can suppose without loss of generality that qo = q\ = ... = q\ a t ~ i- This 
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simplification requires less than Frobenius evaluations on the solutions. 

Indeed, v G GRS*(x^, y £ ) qe , implies that w q G J2iec a GRSt(x^, y £ ) qG+1 . 

For the parameters considered in |6I8| , m and t are rather small, making the cost 
of the Frobenius alignment negligible. In the rest of this article, we assume that 
qo = ... = q\ t -i = 0, which is not a stronger assumption since the private 
elements of ffpub are already defined up to Frobenius endomorphism. 

Example 1. Pick for instance q = 8 and solve the system W q , a with a = 2. 
Thanks to Theorem [2] after re-alignment of the Frobenius exponents, we have a 
basis of the vector space GRSt(x, y) + GRSt(x 2 ,y 2 ), that is: 

{(yiQ(xi) +yfR(xi))o^n-i\Q,R € F g m[z],deg(Q),deg(.R) < t- 1} . 

4 Recovering the Secret Key from a Sum of GRS — A 
Linear Algebra Step 

Once we know a basis (vW)o<»<A 0 ,t-4 °f GRSt( x ty^), we aim at recov- 

ering the basis of a single GRS code. This disentanglement is done in Paragraph 
Ol Then, we show in l4.2l how to recover a private support x and Goppa polyno- 
mial r(z) of the masked Wild Goppa code. This is the full description of a plain 
Wild Goppa code. In the Incognito case (deg (/) > 0), we explain in 14.31 that an 
extra linear step enables to find /. To sum up, the purpose of this section is to 
prove the following theorem. 

Theorem 5. Let q = p s (p prime and s > 0). Let G pu b = (gij)o^i^n-i be cl 

generator matrix of a masked Wild Goppa code ^ pu b = ^(x, f(z)r q ~ 1 (z)). Let 
y = T(x) _1 ; t = deg(T), and 

( \ s-l-a 

^GRS t (xV) where £ a = (J {p r , 2p r , . . . , (p - l)p r } U {p s ~ a } . 

/ r=0 

Once V is given, we can recover in polynomial-time a support x' and polynomials 
f r (z), T\z) G F g m[z] such that ^p U b = ^(x', /'(T') 9-1 ). Stated differently, we 
can recover in polynomial-time a key (x^T 7 ,/') equivalent to the secret-key as 
soon as the system has been solved. 


4.1 Disentanglement of the System Solutions 

The Sidelnikov- Shestakov [26] attack is a well known attack against McEliece 
schemes instantiated with GRS codes [22]. In our case, we can have a sum 
of GRS codes. In this situation, it seems not possible to apply directly [215] 
(because the vectors of GRSt(x^y^) do not have the desired form; that 

is (yoQ(xo), • • • ? yn-iQ(% n -i))- To overcome this issue, we propose to use well- 
chosen intersections to recover a basis suitable for Sidelnikov-Shestakov. To gain 
intuition, we provide a small example. 
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Example 2. We continue with the example [lj By squaring all the elements of 
GRS*(x, y) + GRSt(x 2 ,y 2 ), we have a basis of GRSt(x 2 ,y 2 ) + GRSt(x 4 ,y 4 ): 

{(ViQiXi) + yfR(xf))o^n-i\Q,R e F g m[z],deg(Q),deg(.R) < t - 1} . 

We prove in Proposition [6] that, in charac. 2, 

(GRS t (x,y) + GRS t (x 2 ,y 2 ))n(GRSt(x 2 ,y 2 ) + GRS t (x 4 ,y 4 )) = GRS t (x 2 ,y 2 ). 
Hence, we have a basis of GRS t (x 2 ,y 2 ). 

Our general method to disentangle the solutions is proved in characteristic 2, 
but for other characteristics we need the following assumption: 

Assumption 1. Let q = p s withp prime. Let x G F™ m be a support and y G F™ m 
be defined by y = T(x) _1 for some polynomial T(z ) G F^mf^] of degree t. Let C 
and C! be two subsets of {1, . . . , q} with (fiC + jfCfi < n. Then, we have that: 

^GRS t (x^yV)|q(^GRS*(xGV) = E GRS,(x*,y*). 
lec J \ee£' ) iecnc' 

For the specific subsets C that we encountered, this assumption is rigorously 
proved in characteristic 2 (see Proposition [6j). For bigger characteristics, though 
we could not find a formal proof, we launched more than 100, 000 experiments 
and found out that equality held in all cases. Now we generalize the method of 
intersection of codes proposed in Example O 

Proposition 6. Let q = p s (p prime and s ^ 0). Let also a, 0 < a ^ s, and 
£ = IWo-i -a b r ’ v, ■ • ■ , (P - 1 )p r } u {p s ~ a } . Then: 

GRS ( (xt y e ) n ( E GRS t (x*,y £ ) 

eec a \eec a 

Proof. Let £ : (m 0 , . . . , m n _i) e F£ m n- (mg , . . . , mg.j ). 

First, remark that, as p s ~ a is a power of the characteristic, it 
holds that <L> (GRSt(x^, y^)) = GRSt(x pS a£ ,y pS a£ ) for all i, and 
^ (J2i€C a GRS t (x £ ,y^) = J2e^(c a ) GRS t (x £ , y £ ). When p = 2, we fully 
prove the proposition in Appendix IA.4I Otherwise (when p > 2), we rely on 
Assumption [l] with the sets C a and 

*(Ca)= lj {P r ,2p r ,...,(p-l)p r }u{p 2 ^- a f. 

s—a^.r^2(s—a) — l 

Then, we have C a D <L> (. C a ) = { p s ~ a }, and the desired equality. □ 

Once a basis of GRSt(x pS a , y pS °) is known, we recover x, y and r(z) thanks 
to a variant of Sidelnikov- Shestakov described below. 



GRSt(x p 
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4.2 Sidelnikov- Shestakov Adapted to Recover the Goppa 
Polynomial 

In our attack, we have to adapt the classical Sidelnikov-Shestakov attack for 
special GRS codes, namely those for which there is an additional polynomial 
relation between the support and the multipliers. 

Proposition 7. Let x be an n-tuple (xo, . . . , x n _i) of distinct elements of¥ q m 
and r(z) G be a squarefree polynomial of degree t such that T{xi) ^ 0, 

for all i, 0 ^ i n — 1 Let Ggrs be the generator matrix of a GRS code 
GRSt(x, r(x) _1 ). There is a polynomial-time algorithm which allows to recover 
a n-tuple x' = (#q, . . . ,x' n _ 1 ) of distinct elements of¥ q m and a squarefree poly- 
nomial T'(z ) G F 0 m[zl of deqree t such that T'tx'A ^ 0, for all i,0 ^ i ^ n — 1 
and GRSt (x, T(x) _1 ) = GRS t (x', /"'(x') -1 ) . 

This problem is very close to the one addressed in [26 . The only issue is that the 
homographic transformation on the support used in the original attack indeed 
preserves the GRS structure but not the polynomial link. Thus, polynomial in- 
terpolation over x and y -1 is not possible. We propose to avoid this homographic 
transformation by considering a well chosen extended code. 

Definition 5. Let Y? be a linear code of length n over F g . The extended code of 
denoted by Y?, is a code of length n + 1 obtained by adding to each codeword 
m = (mo, . . . , m n _i) the coordinate — m j- 

Our algorithm, proved in the full version of this paper, is then the following. 


Algorithm 1. Extended Version of Sidelnikov-Shestakov algorithm 

Input : Ggrs generator matrix of Y?grs — GRSt(x, y), with y = T(x) _1 (deg(T) = t) 
Output : Secret x, y, and T(z) 

1: Build P = (pij)o^i^n-t-i a generator matrix of the dual of Y?grs- 

O^j^n-l 

2: Deduce P a matrix of the extended code ( Definition [5} of the code spanned by P 
3: Build (T|U), with U = (uij) o^i^t a parity-check matrix of the code spanned 

by P in systematic form 

4: Solve the linear system with unknowns Xds to find x 

^-{X v - Xj) = Xi - Xj) | 0 ^ i,i’ ^ t,t + 1 < j < n - 1 

U%> Ul> n 

5: Solve the linear system with unknowns Yds to find y (the xds were found at 
previous step) 


Pj^x^Yi = 0|0<j<n — t — 1, 0 < £ < £ — 1 


6: Interpolate T(z) from x and y 
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4.3 Recovery of the Incognito Polynomial by Solving a Linear 
System 

An extra step is necessary in the Incognito case to recover the other factor / 
of the Goppa polynomial. To do so, we recover the multipliers associated to /, 
that is the vector w = /(x) _1 . Then, we perform polynomial interpolation. We 
note that once x and y = T(x) -1 are known, many of the equations of Lemma 
[3] become linear in w. Namely, 

q r n— 1 


q ( n—i 

U {£*< 

i« = l ^ .7—0 


w. 


( y U j Vx T ) = 0 I 0 < i < k - 1, 0 < u x < Uyt + deg (/) - 1 


In practice, we observed that the linear system obtained has a rank defect 
and is not sufficient to find w. However, we can also use the fact that ^ pu b C 
(x, f(z)) to prove that 


n— 1 


j = 0 


1 

LC(7) 



(This is rigorously done in the full version of this article.) Since x is known and 
setting LC (/) = 1, we obtain new linear equations in the components of w. 
Putting all the linear equations together, experiments show then that we obtain 
a unique solution w, and / by polynomial interpolation. 


5 Weakness of Non-prime Base Fields 


The most (computationally) difficult part of our attack against Wild McEliece 
Incognito is to solve the algebraic system defined in Theorem O In this part, 
we aim at giving a better idea of the complexity of resolution by determining 
the exact number of “free variables” in the system. Namely, we show that we 
can eliminate many variables thanks to linear equations. The system W q , a ( Z) = 

[j ueVa {£?=o gi,jZ™ = 0 | 0 ^ i ^ k — lj of Theorem [2] obviously contains k 

linear equations by picking u = 1 (1 G V a by definition). We can easily derive 
other linear equations by applying the additive map z z ^ qTn to all the 
equations in degree p u . As the solutions lie in F^m, it holds that (Zj y m /p u = Zj , 
and for 0 < i < k — 1 



q m /p U 


n—1 



3 = 0 


However, we observed that those linear equations were very redundant. To ex- 
plain those linear dependencies, we found out a property of the masked Wild Goppa 
codes £fg(x, f(z)r(z) q ) (Theorem[8|). Namely, by simple operations on their gen- 
erator matrices, we can build a generator matrix of the code £f p (x, r(z) p ) over ¥ p . 
This latter matrix allows to write many independent linear equations implying the 
private elements of ffpub • 
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Theorem 8. Let q = p s (p prime, and s > 0). Let G pu b = (gi,j)o^i^n-i be 

a generator matrix of a masked Wild Goppa code ffpub = ^(x, f(z)T(z) q ~ 1 ). 
We consider the scalar restriction of m G ffpub ^ F ™ into F®. This yields s 
components . . . , gF” (we write each m G F™ over a F p -basis, i.e. 

m = m (°)# 0 + • • • + Let ^ ¥p C F™ be the code generated by the 

coordinate vectors m^ 0 ), . . . , for all the codewords m G Z ? pu b • Then, it 

holds that 

tf ¥p c # p (x,r(z) p ). 

The proof can be found in the full version of this paper. In practice, we observed 
equality in the inclusion provided sdim(^ pw fr) > dim(£f p (x, T(z) p )). Note that 
£f p (x, T{z) p ) is a Wild Goppa code with the same private elements x and y = 
/^(x) -1 as ‘topub- This provides extra equations on the variables Z of >Vg ? a(Z) 
(proved in the full version): 

Proposition 9. Let ‘Zpub = Sf g (x, f(z)T q ~ 1 (z)) and >Vg 5a (Z) the associated 
system for 1 ^ a ^ s. Let Gf p = Qi,j) o^i^n-i be a generator matrix of 

%(x,T(z) p ) (with k p = dim (Sf p (x, T{z) p ))). Then, the solutions o/Wg )a (Z) 
satisfy: 

p— 1 ( n — 1 ^ 

U I T 9i,i z i = 0|0<u<f-l,0<i<fcp-l>. 

e=o { j= o J 

As k p ^ n — (p — l)mst (and in practice k p = n — (p — l)mst), we have the 
following corollary. 

Corollary 10. The knowledge ofG pu b gives access to n— (p— l)mst independent 
linear relations between the Z^s. The system Wg^Z) contains (at most) (jp — 
l)mst free variables. 

Remark 2. The number of “free” variables given in Corollary [TO] is given with- 
out taking into account the vector space structure of the solutions. Thanks to 
Corollary [4j we know that \ a j extra variables can be fixed to arbitrary values 
in W q , a ( Z). 

For a Goppa polynomial of same degree, but without multiplicities, the number 
of free variables in the system would be n — k > (p s — l)mt instead of (p— l)mst. 
In particular, for a masked code, the number of variables describing it does not 
depend on the degree of the incognito polynomial / and the attack is not harder 
for masked codes. This explains why the codes defined over non-prime fields are 
the weakest ones. 

6 Practical Experiments 

We report below various experimental results performed with our attack on 
various parameters for which [6] said that strength is ” unclear” and that an 
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attack would not be a ” surprise” but for which no actual attack was known. We 
also generated our own keys/parameters to see how the attack scales. We per- 
formed our experiments with off-the-shelf tools (Magma [9] V2.19-1) and using 
a 2.93 GHz Intel PC with 128 Gb. of RAM. As the polynomial system solving is 
by far the most costly step, we give timings only for this one. We performed it 
using the F4 algorithm m ) of Magma. As explained in Section 01 it is neces- 
sary to solve the systems Wg, a (Z) a number of times equal to the dimension of 
the vector space of the solutions (Theorem [2)). These resolutions are completely 
independent and can be executed in parallel. This is why we give the timings 
under the form (number of separate resolutions) x (time for one resolution). By 
T^Z, we denote the number of free variables remaining in the system after clean- 
ing up the linear equations (Corollary [TO]) and fixing coordinates thanks to the 
vector space structure of the solutions (Corollary 0 ]). The general formula is 
#Z = ((p — 1 )ms — #£ a ) t for q = p s and s > 1. 

In the experiments, we tried various parameters a for the systems W q , a ( Z). 
We give a comparison on some examples in Table [l] (the system yPg ? a(Z) with 
a = s can be solved in a reasonable amount of time in actually few cases). 

Table 1. Comparison of the resolution times of W g , a (Z) for various possible a’s. The 
smallest possible a gives the best timings. 


Q 

771 

1 1\ 

n 

1 k 

deg(/) | Solving W g , a (Z) with a = s\ 

Solving Wq, a (Z), optimal a 

32 

2 

2 

678 

554 

0 

2 x 12s (#Z = 18) 

8x 0.08 s (a = 2,#Z = 9) 

32 

2 

1 

532 

406 

32 

2 x 49s (#Z = 9) 

4x 0.02 s (a = 2,#Z = 6) 

32 

2 

3 

852 

621 

24 

3x (30 min 46s) (#Z = 37) 

12x 0.6 s (a = 2,#Z = 18) 

27 

3 

3 

1312 

1078 

0 

3x (3h 10 min) (#Z = 51) 

15x 3.0 s (a= 1,#Z = 39) 


It appeared that a should be chosen so as to maximize the dimension of the 
solution set (Theorem[2j). This choice minimizes the number of variables. Namely, 
the best choice is to set a = 1 when p > 2. When p = 2, setting a — 1 would yield 
only “linear” equations (of degree 2 U ,u < s). So, we set a = 2 and the systems 
W 2 «, 2 (Z) contain only cubic equations. We recall that for a = s, Assumption |T] 
is not necessary, whereas we rely on it when a < s and p / 2. In the rest of the 
experiments, we always pick the best choice for a. 

In Table O we present experimental results performed with Wild McEliece 
(when deg (/) = 0) and Incognito (deg (/) > 0) parameters. For Wild McEliece, 
all the parameters in the scope of our attack were quoted in [6j Table 7.1] with 
the international biohazard symbol ft. The reason is that, for those parameters, 
enumerating all the possible Goppa polynomials is computationaly feasible. In 
the current state of the art, to apply the SSA attack ([IB]), one would not only 
have to enumerate the irreducible polynomials of F q m [z] , but also all the possible 
support sets, as the support-splitting algorithm uses the support set as input. 
This introduces a factor ( q n ) in the cost of SSA, chosen by the designers in order 
to make the attack infeasible. However, the authors of |6 conclude that, even 
if no attack is known against those instances, algorithmical progress in support 
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enumeration may be possible and therefor they do not recommend their use. In 
the case of non-prime base fields, experiments show that our attack represents 
a far more serious threat for the security of some of those instances: for q G 
{32, 27, 16} we could find the secret keys of parameters with high ISD complexity. 
We indicate, for each set of parameters, the ISD complexity (obtained thanks 
to Peters’ softwar^, as it remains the reference to evaluate the security of a 
McEliece scheme. We also give the complexity of an SSA attack, which is in the 
current state-of-the-art ( q n ) • q mt /t. 

Regarding Wild McEliece Incognito, we broke the parameters indicated with 
a security of 2 128 in |8j Table 5.1] for q G {32,27,16}. For some other non- 
prime base fields, we give the hardest parameters in the scope of our attack in 
roughly one day of computation. Note that here, SSA complexity is given by 

CC) • ( q mit+s) /(ts )). 

For the sake of completeness, we also include in Tables [2] Wild McEliece 
schemes with a quadratic extension. In m , the authors already presented a poly- 
time attack in this particular case: it applies for the parameters with q = 32, but 
not for the other ones. We want to stress that our attack also works for m = 2 
and any t am does not work in the extreme case t = 1). Also, we emphasize 
that, whilst solving a non-linear system, our attack is actually faster than HD 
in some cases. For q = 32 and t = 4, the attack of m requires 49.5 minutes 
(using a non-optimized Magma implementation according to the authors). We 
can mount our attack in several seconds with the techniques of this paper. 


Table 2. Practical experiments with Wild McEliece & Incognito parameters. ISD 
complexity is obtained thanks to Peters’ softwarJ^. SSA attack complexity is given 
under the form (support enumeration) -(Goppa polynomial enumeration). 


Q 

\m\ 

1 1\ 

1 n 

1 * 

|deg(/)|Key (kB)|lSD| 

| SSA 

| Solving Wg, a (Z), optimal a 

32 

2 

4 

841 

601 

0 

92 

2 128 

2688 2 33 

16x 10 s (#Z = 36) 

32 

2 

5 

800 

505 

0 

93 

2 136 

2 771 • 2 48 

20 x (2 min 45s) (#Z = 40) 

27 

3 

3 

1312 

1078 

0 

45 

2 113 

26947 # 2 41 

15x 3.0 s (#Z = 39) 

27 

3 

4 

1407 

1095 

0 

203 

2 128 

27304 # 2 55 

20 x (6 min 34 s) (#Z = 52) 

27 

3 

5 

1700 

1310 

0 

304 

2 158 

28343 . 

25 x (lh 59 min) (#Z = 65) 

27 

3 

5 

1800 

1410 

0 

327 

2 160 

28679 _ 2®9 

25 x (lh 37 min) (#Z = 65) 

16 

3 

6 

1316 

1046 

0 

141 

2 129 

23703 2 69 

18x (36h 26 min) (#Z = 54) 

32 

2 

3 

852 

621 

24 

90 

2 13U 

2603 . 2 273 

12x 0.6 s (#Z = 18) 

27 

3 

2 

1500 

1218 

42 

204 

2 128 

25253 _ 2 225 

lOx 0.9 s (#Z = 26) 

25 

3 

3 

1206 

915 

25 

155 

2 117 

27643 # 2 632 

15x (lh 2 min) (#Z = 57) 

16 

3 

6 

1328 

1010 

16 

160 

2 125 

23716 # 2 265 

18x (36h 35 min) (#Z = 54) 

9 

3 

6 

728 

542 

14 

40 

2 81 

22759 _ 2 191 

18 x (25h 13 min) (#Z = 54) 


1 Available at http://christianepeters.wordpress.com/publications/tools/ 
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7 Conclusion and Future Work 

In practice, we could not solve (in less than two days) the algebraic systems 
involved when the number of free variables exceeds 65. We recall the relation 
#Z = ((p — l)ms — #£ a ) t (for q = p s and s > 1), which should help the 
designers to scale their parameters. An important remaining open question is to 
give a precise complexity estimates for the polynomial system solving phase in 
those cases. 


Acknowledgements. This work was supported in part by the HPAC grant 
(ANR ANR-11-BS02-013) of the French National Research Agency. The authors 
would also like to thank (some of) the referees as well as PC chairs for their 
useful comments on a preliminary version of this paper. 


References 

1. Barbier, M., Barreto, P.S.L.M.: Key reduction of McEliece’s cryptosystem using 
list decoding. In: Kuleshov, A., Blinovsky, V., Ephremides, A. (eds.) 2011 IEEE 
International Symposium on Information Theory Proceedings, ISIT 2011, St, St. 
Petersburg, Russia, July 31 - August 5, pp. 2681-2685. IEEE (2011) 

2. Barreto, P.S.L.M., Lindner, R., Misoczki, R.: Monoidic codes in cryptography. In: 
Yang (ed.) [27], pp. 179-199 

3. Becker, A., Joux, A., May, A., Meurer, A.: Decoding random binary linear codes 
in 2 n / 20 : How 1 + 1 = 0 improves information set decoding. In: Pointcheval, D., 
Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp. 520-536. Springer, 
Heidelberg (2012) 

4. Berger, T.P., Cayrel, P.-L., Gaborit, P., Otmani, A.: Reducing key length of 
the McEliece cryptosystem. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, 
vol. 5580, pp. 77-97. Springer, Heidelberg (2009) 

5. Bernstein, D.J., Lange, T., Peters, C.: Attacking and defending the McEliece cryp- 
tosystem. In: Buchmann, J., Ding, J. (eds.) PQCrypto 2008. LNCS, vol. 5299, 
pp. 31-46. Springer, Heidelberg (2008) 

6. Bernstein, D.J., Lange, T., Peters, C.: Wild McEliece. In: Biryukov, A., Gong, G., 
Stinson, D.R. (eds.) SAC 2010. LNCS, vol. 6544, pp. 143-158. Springer, Heidelberg 
( 2011 ) 

7. Bernstein, D.J., Lange, T., Peters, C.: Smaller decoding exponents: Ball-collision 
decoding. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 743-760. 
Springer, Heidelberg (2011) 

8. Bernstein, D.J., Lange, T., Peters, C.: Wild McEliece incognito. In: Yang (ed.) [27], 
pp. 244-254 

9. Bosma, W., Cannon, J.J., Playoust, C.: The Magma algebra system I: The user 
language. Journal of Symbolic Computation 24(3-4), 235-265 (1997) 

10. Canteaut, A., Chabaud, F.: A new algorithm for finding minimum- weight words in 
a linear code: Application to McEliece’s cryptosystem and to narrow-sense BCH 
codes of length 511. IEEE Transactions on Information Theory 44(1), 367-378 
(1998) 



38 


J.-C. Faugere, L. Perret, and F. de Portzamparc 


11. Couvreur, A., Otmani, A., Tillich, J.-P.: Polynomial time attack on wild McEliece 
over quadratic extensions. In: Nguyen, P.Q., Oswald, E. (eds.) EUROCRYPT 2014. 
LNCS, vol. 8441, pp. 17-39. Springer, Heidelberg (2014) 

12. Faugere, J.-C.: A new efficient algorithm for computing grobner bases (F4). Journal 
of Pure and Applied Algebra 139(1-3), 61-88 (1999) 

13. Faugere, J.-C., Otmani, A., Perret, L., de Portzamparc, F., Tillich, J.-P.: Structural 
cryptanalysis of McEliece schemes with compact keys. IACR Cryptology ePrint 
Archive, 2014:210 (2014) 

14. Faugere, J.-C., Otmani, A., Perret, L., Tillich, J.-P.: Algebraic cryptanalysis of 
Mceliece variants with compact keys. In: Gilbert, H. (ed.) EUROCRYPT 2010. 
LNCS, vol. 6110, pp. 279-298. Springer, Heidelberg (2010) 

15. Faugere, J.-C., Otmani, A., Perret, L., Tillich, J.-P.: Algebraic Cryptanalysis of 
McEliece variants with compact keys - toward a complexity analysis. In: SCC 
2010: Proceedings of the 2nd International Conference on Symbolic Computation 
and Cryptography, pp. 45-55. RHUL (June 2010) 

16. Heyse, S.: Implementation of McEliece based on quasi-dyadic Goppa codes for 
embedded devices. In: Yang (ed.) [27], pp. 143-162 

17. Leon, J.S.: A probabilistic algorithm for computing minimum weights of large 
error-correcting codes. IEEE Transactions on Information Theory 34(5), 1354-1359 
(1988) 

18. Loidreau, P., Sendrier, N.: Weak keys in the McEliece public- key cryptosystem. 
IEEE Transactions on Information Theory 47(3), 1207-1211 (2001) 

19. May, A., Meurer, A., Thomae, E.: Decoding random linear codes in O(2 0 054n ). 
In: Lee, D.H., Wang, X. (eds.) ASIACRYPT 2011. LNCS, vol. 7073, pp. 107-124. 
Springer, Heidelberg (2011) 

20. McEliece, R.J.: A Public-Key System Based on Algebraic Coding Theory, 
pp. 114-116. Jet Propulsion Lab (1978), DSN Progress Report 44 

21. Misoczki, R., Barreto, P.S.L.M.: Compact McEliece keys from Goppa codes. In: 
Jacobson Jr., M.J., Rijmen, V., Safavi-Naini, R. (eds.) SAC 2009. LNCS, vol. 5867, 
pp. 376-392. Springer, Heidelberg (2009) 

22. Niederreiter, H.: Knapsack- type cryptosystems and algebraic coding theory. Prob- 
lems Control Inform. Theory 15(2), 159-166 (1986) 

23. Persichetti, E.: Compact McEliece keys based on quasi-dyadic srivastava codes. J. 
Mathematical Cryptology 6(2), 149-169 (2012) 

24. Peters, C.: Information-set decoding for linear codes over ¥ q . In: Sendrier, N. (ed.) 
PQCrypto 2010. LNCS, vol. 6061, pp. 81-94. Springer, Heidelberg (2010) 

25. Sendrier, N.: Finding the permutation between equivalent linear codes: The support 
splitting algorithm. IEEE Transactions on Information Theory 46(4), 1193-1203 
(2000) 

26. Sidelnikov, V., Shestakov, S.: On the insecurity of cryptosystems based on gener- 
alized Reed-Solomon codes. Discrete Mathematics and Applications 1(4), 439-444 
(1992) 

27. Yang, B.-Y. (ed.): PQCrypto 2011. LNCS, vol. 7071. Springer, Heidelberg (2011) 



Algebraic Attack against Variants of McEliece with Goppa Polynomial 


39 


A Appendix 

A.l A Technical Lemma 

We prove a technical lemma which is useful for the proofs of Sections [3] and (4j 

Lemma 1. Let q = p s (p prime and s > 0 ), and Q = 7 t z t + • • • + 70 E F q m [z] 
be a polynomial of degree t. For all j, it holds that: 

Q( Z ) pj =if(z t y j + --- + 7cf 

= 7?V i ) t + - + 7o pJ 

= F w (Q)(zP 1 ). 


where (Q) = z l + • • • + 70 is the polynomial of same degree as Q obtained 
by raising all the coefficients to the pi -power. 

A. 2 Proof of Lemma [3] 

We want to prove that, under the conditions of Lemma [3] (that is 0 u y ^ 
q, 0 ^ u x ^ u y t i— 1, 0 ^ u ^ deg (/) - 1,6 G {0, 1} and (6, iz y ) 7^ (0, 0)), it holds 
that 



Proof The crucial remark is that, for any cgFJ, z -x- = 0 mod f(z)F q (z) 

implies = 0 mod f b (z)F u y(z) for all 0 ^ u y ^ g and 0 ^ b ^ 1 (and 

(■ u y , 6) 7^ (0, 0)). In other words, for those u y ,b , it holds that 

^^,( x ,/‘( z )r“»( z )). 

As Sf g (x, f(z) b r u y(z)) has parity check matrix Vd tof (x, w b y Uy ) (with d to t = 
bdeg(f) + Uyt), the matrix products V^ tot (x, w 6 y^ ) x = 0^ totX /c yield all 

the relations of the lemma. □ 


A. 3 Proof of Theorem [2] 

Proof We give the multinomial development of the VjQ^( xi j)) U 

under the form u ot Ux ^ Uy y u - v x u - x and show that u x ,u y satisfy the 

conditions of Lemma [3J This is done separately for u E V\ and u eV 2 . 

Case u G Pi. We pick u E {1,2, ...,p a — 1} and use the multinomial formula 
to expand (X^e£ a UjQ^( x ^)) • Namely, with L a = #£ a , we have: 


E VjQeixj) 

aec a 


E 

U \~\~ . . .+Ui =u 


( E *«< ) \ 

v ^ e£a ) 
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Let’s look at each term y- v x^ x in the sum. For iq, . . . , ul non- negative integers 

with ui + . . . + ul = u, it holds that u y = Y tug ^ max(£ a ) Y U£ ^ p s ~ a u 

iec a £ec a 

p s . For each y u - v , several terms yj v xj x appear after expanding Yl Qi( x ^) U£ - In 

Qz{ x ]) UfL the maximal power u x appearing is iun(t — 1) (as Qi has degree t — 1). 

Thus, in Yl Q ^{ x ]) u S the maximal power is (t — 1) Y = (t — 1 ) u y ^ tu y — 1. 

-£e£ a -£e£ a 

Case u G P 2 . We pick b G {a, . ..,s}. Then = QV e£a = 

Yiec a yj P F(b){Qi){ x£ j P ) (Lemma [TJ. Pick £ G £ a , it writes £ = <ap c with 
1 ^ <a < p and 0 c ^ s — a. Thus we have £p b = <ap c+6 . The euclid- 
ian division of c + b by s gives c + b = ds + e with 0 ^ e < s. The ex- 
ponent £p b then writes ip b — ap e p ds = ap e q d . As g q - = gij it holds that 

„m /„d 

/ — 1 O 

( 2 ^ 7=0 


F(b)(Qt)( x [ 


ap e q d ^ qm/q 


= ES^yr^-^(^)(®r)- As 


the F^_ds)(Rey s have degree lower than £, all the terms of the sum are of the 
form y U A V x u - x with u y ^ q (since <ap e < p s ) and ^ u y t — 1. □ 


A. 4 Proof of Proposition [6] 

When p = 2, we prove Proposition [6] without resorting to Assumption [lj We 
use the fact that the polynomial T(z) linking x and y -1 is irreducible in the 
construction proposed in [6l8j . For p = 2, £ a is reduced to powers of 2, namely 
jC a = {p u }o^n^s-a- So the proof consists in showing that the intersection 

( s—a \ / 2(s— a) 

y]GRSt(x J, “,y p “) J n y] GRS t (x p “,y p “) 

n=0 / \u=s—a 

is reduced to GRSt(x pS a ,y pS a ). 

Proof. We pick v G X. There exist polynomials XV, Q pS -a+n G F g m[z] (with 
0 ^ u s — a) of degree lower than £ such that 






n=0 


n=0 

W 


for all 0 i n — 1. As ^ = P{xi) \ we obtain polynomial relations in the 

2(s — a) 


xfs by multiplying by r(xi) p 

s—a 

n 2(a-a)-«) 


F r( - x P 


Rp-{xf) = Y J r ^ 


. This yields n relations 

2 (s -a)-( S — a + u) 


Q p s-a +U (^ )• 


n=0 


n=0 


We suppose here that the degree of this polynomial relation is lower than n, 
that is (t — l)p 2 ^ s_a ) < n, so that we can deduce the polynomial equality: 


J2r(zp 2is - a) -^R p u( z ^) = ^r(zf^ a) -^ a+u) Q p s- a+u (^ 


u = 0 


n=0 


) ( 7 ) 
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Modulo r(z) all polynomials vanish but one, this yields Q p 2 ( S -a)(z p2(s a) ) = 
0 mod r(z). Thanks to Lemma [lj we have r(z) divides Q p 2 ( S - a ) (z p2(s a) ) = 

{F(u){Q P 2 (s-a))(z)) (for u = ms — 2(s — a)). As r(z) is irreducible, this 

entails that F(z) divides F^(Q p 2( s - a ))(z), but F^(Q p 2( S - a ))(z) has same de- 
gree as Q p 2(8-a)(z), which has degree lower than t (notations as in the proof of 
Theorem[2|). Hence we deduce that F^(Q p 2 ( S -a))(z) = 0 and also its Frobenius 
Q p 2 ( S -a) = 0. Then, we look at the new relation of type ([7J) and start over with 

the polynomial Q p 2( 8 -a)-i [z p ). The proof that Q p 2<>-a)-i = 0 is identi- 

cal. One after the other, we prove that all the polynomials R p u, Q pS - a +u are 
zero except the matching polynomials R pS - a and Q pS - a which are equal, so that 
z G GRSt (x p , y p ) . The problem when p ^ 2 is that the set C a contains 
exponents which are not a pure power of p. □ 


Bivariate Polynomials Modulo Composites 
and Their Applications 


Dan Boneh and Henry Corrigan- Gibbs 
Stanford University, Stanford CA 94305, USA 


Abstract. We investigate the hardness of finding solutions to bivariate 
polynomial congruences modulo RSA composites. We establish necessary 
conditions for a bivariate polynomial to be one-way, second preimage 
resistant, and collision resistant based on arithmetic properties of the 
polynomial. From these conditions we deduce a new computational as- 
sumption that implies an efficient algebraic collision-resistant hash func- 
tion. We explore the assumption and relate it to known computational 
problems. The assumption leads to (i) a new statistically hiding com- 
mitment scheme that composes well with Pedersen commitments, (ii) 
a conceptually simple cryptographic accumulator, and (iii) an efficient 
chameleon hash function. 

Keywords: Algebraic curves, bivariate polynomials, cryptographic com- 
mitments, Merkle trees. 


1 Introduction 

In this paper, we investigate the cryptographic properties of bivariate polyno- 
mials modulo random RSA composites N = pq. We ask: for which integer poly- 
nomials / G Z [x,y] does the function / : Zjy x TLn — ^ Zjv defined by / appear 
to be a one-way function, a second-preimage-resistant function, or a collision- 
resistant function? We say that a polynomial / G Z[x, y\ is one-way if the func- 
tion /iZtvxZat-tZtv defined by / is one-way fSection l3.1D . We similarly define 
second-preimage-resistance (Section [3?2]) and collision-resistance (Section l3.3[) of 
polynomials / G Z [x,y\. 

Using tools from algebraic geometry we develop a heuristic for deducing the 
cryptographic properties of a bivariate polynomial over Z^r from its arithmetic 
properties, namely from its properties as a polynomial over the rationals Q. 
We give a number of necessary conditions for a bivariate polynomial to be one- 
way, second-preimage-resistant, or collision-resistant. We also provide examples 
of polynomials / that appear to satisfy each of these properties and we offer 
separations between these three classes. 

Taking collision resistance as an example, we conjecture that a bivariate poly- 
nomial / G Z [x,y\ that defines an injective function / : Q 2 —> Q gives a colli- 
sion resistant function / : I? N -A Zjv where A is a random RSA modulus 
of secret factorization (see Section not . Constructing an explicit polynomial 
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/ G Z [x,y\ that is provably injective over the rationals is an open number the- 
oretic problem m ]• However, even relatively simple polynomials appear to be 
injective over Q 2 . For example, Don Zagier [13134] conjectures that the poly- 
nomial f ZAG (x,y) := x 7 + 3 y 7 , which we refer to as the Zagier polynomial , is 
injective over the rationals. Since the only apparent efficient strategy for finding 
collisions in / ZAG over Zat is to find rational collisions and reduce them modulo TV, 
we conjecture that / ZAG is collision resistant over Zat. To build confidence in the 
assumption that / ZAG is collision resistant over Zat we discuss potential collision- 
finding strategies and relate them to existing number theoretic problems. 

Applications. We demonstrate that the existence of low-degree collision-resistant 
bivariate polynomials gives rise to very efficient instantiations of a number of 
cryptographic primitives. 

First, we derive a statistically hiding commitment scheme which is computa- 
tionally inexpensive to evaluate and composes naturally with Pedersen commit- 
ments. By “nesting” these new commitments inside of Pedersen commitments, 
we obtain an efficient zero-knowledge protocol for proving knowledge of an open- 
ing of a commitment which is nested inside of another commitment. Use of 
nested commitments reduces the length of transactions in an anonymous e-cash 
scheme m by roughly 70%. 

Second, we demonstrate that the new commitment scheme, in conjunction 
with Merkle trees, can serve as a simple replacement for one-way accumula- 
tors. Though the communication complexity of our accumulator construction 
is asymptotically worse than that of strong-RSA accumulators 0 — 0(log |*Sj) 
versus 0(1) for a set S being accumulated — our construction has the benefit of 
being conceptually simple and easy to implement. 

Third, from the same collision-resistant polynomial, we derive a new chameleon 
hash function, signature scheme, claw-free permutation family, and a variable- 
length algebraic hash function. 


2 Related Work 

Multivariate polynomials in Zat have a long history in cryptography. For ex- 
ample, the security of the Ong-Schnorr-Shamir signature scheme [26] followed 
from the hardness of finding solutions to a particular type of bivariate polyno- 
mial equation over Zat • Pollard and Schnorr later demonstrated a general attack 
against the hardness of finding solutions to such equations [28]. 

Shamir related the hardness of factoring certain multivariate polynomials 
modulo N to the problem of factoring the modulus N itself [33]. Schwenk and 
Eisfeld proposed encryption and signature schemes reliant on the hardness of 
finding roots of random univariate polynomials / G 7L\x\ modulo a composite TV, 
and they prove that this problem is as hard as factoring N [3Tj . 

This work introduces a new statistically hiding commitment scheme based on 
low-degree polynomials. Commitment schemes are used widely in cryptography. 
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Prior work has derived statistically hiding commitment schemes from the dis- 
crete log problem EH, the Paillier cryptosystem QJ], and the RSA problem [3]. 
Verifying the correctness of opening a commitment in these existing schemes 
requires expensive modular exponentiations or elliptic curve scalar multiplica- 
tions. Verifying an opening with our new commitment scheme requires just a 
few modular multiplications. By combining our new commitment scheme with 
traditional Pedersen commitments, we improve the communication efficiency of 
the Zerocoin decentralized e-cash construction [24] . 

Given a Pedersen commitment and a finite set of elements 5, our commit- 
ment scheme leads to a simple zero-knowledge protocol for proving knowledge 
of an opening x of the commitment such that x £ S. The length of the proof is 
log \S\. This technique, which uses Merkle trees [21], has applications to anony- 
mous authentication and credential systems and it has the potential to replace 
traditional RSA one-way accumulators, introduced by Benaloh and De Mare [5] 
and revisited by Baric and Pfitzmann [4 . 

Camenisch and Lysyanskaya presented an efficient zero-knowledge protocol 
which proves that a value contained in a Pedersen commitment is also con- 
tained in a particular strong-RS A accumulator [8] . The Camenisch- Lysyanskaya 
accumulator produces a shorter proof of knowledge than ours does, but the con- 
ceptual simplicity and ease of implementation may make our Merkle-style proof 
more attractive for some applications. 

The “zero-knowledge sets” of Micali, Rabin, and Kilian solve an orthogonal 
problem: a prover publishes a commitment to a set S and later can prove that x G 
S without leaking other information about S [23]. In contrast, we are interested 
in hiding the value x but allow the set of items S to be public. 


3 Cryptographic Properties of Polynomials 

We begin by surveying the cryptographic properties of integer polynomials mod- 
ulo random RSA composites. Our goal is to relate the algebraic properties of 
polynomials to their cryptographic complexity. In particular, we identify fami- 
lies of integer polynomials that give rise to progressively stronger cryptographic 
primitives: one-way functions, second-preimage-resistant functions, and collision- 
resistant functions. 

Notation. We write x A S to indicate that the variable x takes on a value 
sampled independently and uniformly at random from a finite set S. A function 
/ : Z — »• M + is negligible if it is smaller than l/p(A) for every polynomial p() 
and all sufficiently large A. We denote an arbitrary negligible function in A as 
negl(A). We use the notation f(x) := x 2 to indicate the definition of a new term. 

In what follows, we let RSAgen(A) denote a randomized algorithm that runs 
in time polynomial in A. The algorithm generates two random len(A)-bit primes 
p and q and outputs (p, q , N := p-q). Here len : Z + — )> Z + is some fixed function 
that determines the size of the primes p and q as a function of A. 
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Let / G Z[x, ?/] be a bivariate polynomial. For c G Z consider the curve 
f(x,y) = c. The genus of this curve is a standard measure of its “complexity:” 
conics have genus zero, elliptic curves have genus one, and so on (see, e.g. EE]). 
We define the genus of a polynomial / as follows: 

Definition 1. The genus of a polynomial f G 7L\x,y\ is defined as 

max(genus( f(x,y) = c )). 

cGQ 

As we will see, the genus of a polynomial / has some relation to its crypto- 
graphic properties. While we focus on bivariate polynomials, most of the follow- 
ing discussion generalizes to multivariates. 

We use the following terms throughout this section to describe relationships 
between curves. (For more precise definitions, see Hindry and Silverman jl8j Sec. 
A. 1.2].) A rational map from a curve C to another curve C' is a pair of rational 
functions g and h mapping points (x,y) on C to points (g(x,y),h(x,y)) on C’ . 
A birational map from C to C' is a rational map which is a bijection between 
points on C and C’ such that the map’s inverse is also rational. Two curves C 
and C’ and are birationally equivalent if there is a birational map from C to C’ . 
An automorphism is a birational map from a curve to itself. 

3.1 One-Way Polynomials 

One-way functions are the basis of much of cryptography. A function g : X — >> Y 
is one-way if, given the image c = f(x) of a random point x G X, it is hard 
to find an x' such that f(x') = c. We first ask: what polynomials give rise to 
one-way functions? 

Definition 2. A polynomial f in Z[xi, ..., is one-way if for every p.p.t. al- 
gorithm A the following advantage is a negligible function of X: 

Adv^j(A) := Pr[7V <— RSAgen(A), x A (Z at/, c <— f(x) : 

f(A(N,c)) = c in Zjv] • 

Clearly linear polynomials are not one-way. A result of Pollard and Schnorr [28] 
shows that quadratic polynomials, indeed all genus zero polynomials, are not 
one-way. 

Theorem 3. A genus zero polynomial f G Z [x,y\ is not one-way. 

Proof sketch. For all c G Q the curve f(x,y) = c is of genus zero, or is a product 
of genus zero curves. A genus zero curve is birationally equivalent to a linear or 
quadratic curve f{x,y) = 0 [18, Theorem A. 4. 3.1]. If f(x,y) is linear in one of 
the variables x or y then finding points on this curve is easy thereby breaking the 
one-wayness of /. This leaves the case where f(x,y) is quadratic in both x and 
y. Let N be an output of RSAgen(A). Let / G Z [x,y\ be a quadratic polynomial 
in x and y and let c G Zjv- There is an efficient algorithm that for most c G Zjv 
finds an (xo,2/o) £ such that f(xo,yo) = c in Z^v, breaking the one-wayness 
of /. See for example [6[ Sec. 5.2] for a description of the algorithm. □ 
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Theorem [3] played an important role in analyzing the security of the Ong- 
Schnorr- Shamir signature scheme [26]. The scheme depended on the difficulty of 
finding solutions (x, y ) to the equation: 

x 2 + ay 2 = b in Zat 

for known constants a, b G Z/v . Since this equation defines a genus-zero curve, 
Theorem[3] shows that it is possible to efficiently find solutions without knowledge 
of the factors of N. Pollard and Schnorr demonstrated an attack against the 
scheme soon after its publication [28132j . 

One-way Polynomials. It is not known how to break the one-wayness of poly- 
nomials / G Z [x,y] that are not genus zero. Thus, for example, even a simple 
polynomial such as f(x,y) = y 2 — x 3 may be one-way, although that would 
require further study. 


3.2 Second Preimage Resistant Polynomials 

A function / : U V is second preimage resistant if, given u eU, it is difficult 
to find a u' ^ u G U such that f(u) = f(u r ). We define a similar notion for 
polynomials: 

Definition 4. A polynomial f in Z[aq, ..., xf\ is second preimage resistant if ’ for 
every p.p.t. algorithm A, the following advantage is a negligible function of X: 

Adv A/ (A) := Pr [N <- RSAgen(A), x A (Z N ) e , x <- A(N,x) : 

f(x) = f(x') in Zjv A x ± x'] . 

Since genus 0 polynomials are not one-way they are also not second preimage 
resistant. It is similarly straight-forward to show that no genus-one polynomial 
is second preimage resistant. 

Proposition 5. A genus one polynomial f G Z [x,y\ is not second preimage 
resistant. 

To see why, let / G Z[x, y] be a polynomial such that /(x, y) = c is a curve of 
genus one for all but finitely many c G Q. Then / is not second preimage resistant 
because of the group structure on elliptic curves. That is, let N be an output of 
RSAgen(A). Choose a random pair (#o,2/o) G I? N and set c := /(xo,2/o) G 
Then P = (xo,yo) is a point on the curve f(x,y) = c and so is the point 
2 P = P + P where addition refers to the elliptic curve group operation. With 
overwhelming probability 2 P is not the point at infinity and therefore, given P 
as input, the adversary can output 2 P as a second preimage for P. It follows 
that / is not second preimage resistant. 

Even polynomials that give higher genus curves need not be second preimage 
resistant. For example, a hyperelliptic polynomial of genus g > 2 has the form 
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f(x,y) = y 2 — h(x) G 7L\x,y\ where h G Z[x\ is a polynomial of degree 2g + 1 
or 2 g + 2. The simple fact that f(xo,yo) = f(xo,—yo) immediately gives a 
second preimage attack on these polynomials: given (xo, yo) the attacker outputs 
(xo, — yo) as a second preimage. 

The fact that all curves of genus two are hyperelliptic [18] Theorem A. 4. 5.1] 
leads to the following proposition: 

Proposition 6. A genus two polynomial f G Z[x, ?/] is not second preimage 
resistant. 

This proposition, in combination with Theorem [3] and Proposition [5] means that 
all second preimage resistant polynomials must have genus at least three. 

As outlined above, elliptic (genus one) and hyperelliptic (genus two) poly- 
nomials are not second preimage resistant because there are non-trivial auto- 
morphisms on the associated curves. We say that a polynomial / G Z [x,y] is 
automorphism free if, for all but finitely many c G Q, the curve f(x,y) = c has 
no automorphisms over Q, apart from the trivial map (x, y) (x, y ). It is natu- 
ral to conjecture that every automorphism-free polynomial / G Z [x,y\ is second 
preimage resistant. 

Poonen constructs a large family of automorphism- free polynomials, in ar- 
bitrarily many variables and of arbitrarily large degree [29]. For example, he 
proves that the polynomial f(x,y) = x 3 + xy 3 + y A is automorphism-free over 
the rationals [29]. 

A Historical Aside: q - Way Preimage Resistance. A stronger notion of 
preimage resistance for a function / : U V, called q-way preimage resistance , 
states that given a random v G V and random points ui, . . . , u q in U such that 
v = f(ui) = • • • = f(u q ), it is difficult to find a new point u G U \ {ui, . . . , u q } 
such that f{u) = v . 

As before, one can define a similar property for polynomials. That is, a poly- 
nomial / in Z [x,y] is q-way preimage resistant if, for a random RSA moduli N 
and a random c G Z^v, given q points on the curve f(x,y) = c in Z^v, it is hard 
to find another point on this curve. 

Kilian and Petrank HSI proposed an authentication scheme whose security is 
based on the g-way preimage resistance of the polynomial f K P (x, y) = x e — y e , for 
some small odd e, say e = 17. In their scheme, q is the total number of users in 
the system. Naor [25] refers to the computational assumption that f KP is g-way 
preimage resistant as the Difference RSA Assumption. We note that the poly- 
nomial / KP is not even second preimage resistant because there is a non-trivial 
automorphism (x, y ) (— y, —x) on the curve. In other words, for any point 
(rz?o ? 2/o ) we have that f K p(xo,yo) = / KP (— yo, — xo). This bad symmetry appears 
to violate the security properties needed for the Kilian-Petrank identification 
scheme, but the scheme can be modified to resist such attacks. 

Camenisch and Stadler m Sec. 6] used a similar assumption to construct 
group signatures. They need the polynomial f cs (x,y) = x ei + ay e 2 to be g-way 
preimage resistant for some small e\ and e 2 . They propose using e\ = 5 and 
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e 2 = 3. We observe in that next section that the polynomial f(x, y ) = x 5 +y 1 * 3 is 
not collision resistant. Nevertheless, it may be second preimage resistant. 


3.3 Collision-Resistant Polynomials 

A function / : U V is collision resistant if it is difficult to find a pair u ^ v! G 
U such that f(u) = f{u r ). We define a similar notion for polynomials: 

Definition 7. For a polynomial f and an integer N , we say that 

x,y G (Z nY are an TV-collision for / if f(x) = f(y) in Zn and x j - y. 

Definition 8 . A polynomial f in Z[xi, ...,xi\ is collision resistant if for every 
p.p.t. algorithm A the following advantage is a negligible function of A; 

Adv^j(A) := Pr [ N <— RSAgen(A) : A(N) is an Wcollision for / ] . 

In the previous two subsections, we observed that polynomials / G Z [x,y\ 
which are of genus g < 2 or which are hyperelliptic, are not second preimage 
resistant and thus are not collision resistant. 

Even polynomials that are second preimage resistant are not necessarily col- 
lision resistant. For example, in Section [3T2l we suggested that the polynomial 
f(x,y) = x 3 + xy 3 + y 4 may be second preimage resistant. However, it is cer- 
tainly not collision resistant, since for any r G Q, the points (r 4 ,0) and (0,r 3 ) 
constitute a collision. 

Attacking Collision Resistance over the Rationals. Suppose that a poly- 
nomial / G Z[xi, . . . , Xi\ has a rational collision. That is, there are rational points 
xo 7 ^ x\ in Q £ such that f(x o) = f(x i). Then, for mosl0 RSA moduli A, the 
points xq and x\ give a collision for / in Zjy. This breaks the collision resistance 
of / when the security parameter A is sufficiently large. Indeed, for sufficiently 
large A the attack algorithm can construct the fixed rational points xo and x\ 
by exhaustive search and obtain collisions for / for most RSA moduli output by 
RSAgen(A). 

The discussion above shows that if a polynomial / G Z[xi, . . . , has a 
rational collision then / is not collision resistant. We summarize this in the 
following proposition. 

Proposition 9. If a polynomial f G Z[xi, . . . ,a^] is collision resistant then the 
function f : (Q/ — » Q must be injective. 

If / G Z[xi, . . . , Xi\ defines an injective function from (Q/ to Q then / is said 
to be an injective polynomial. Proposition [9] shows that the search for collision- 
resistant polynomials must begin with the search for an injective polynomial 
over the rationals. 

1 The points xo and x\ give a collision in Zn whenever N is relatively prime to their 

denominators and xq x\ mod N. This holds with overwhelming probability for 

sufficiently large A. 
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Injective Polynomials. Even the existence of bivariate injective polynomials 
is currently an open problem. Poonen m shows that they exist under certain 
number theoretic conjectures. Moreover, Poonen [30!, Lemma 2.3] shows that 
if / £ Z [x,y\ has only a finite number of rational collisions then one can use 
/ to construct an injective polynomial g £ Z [x,y\ by pre-composing / with a 
suitable polynomial map. In other words, an “almost” injective polynomial can 
be converted to an injective one. 

Although proving that a particular polynomial is injective over Q is currently 
out of reach, there are simple polynomials that appear to have this property. 
In particular, Don ZagieJl conjectures that the polynomial f ZAG (x,y) := x 7 + 
3 y 7 (the “Zagier polynomial”) defines an injective function from Q 2 to Q. As 
indirect evidence, Cornelissen 13, Remarque 10] and Poonen [30, Remark 1.7] 
remark that the four- variate generalization of the afroconjecture [7] implies that 
f(x,y) = x e + 3 y e is injective over the rationals for “sufficiently large” odd 
integers e. Experimentally, we have confirmed that there are no rational collisions 
in f zag fc> r rationals with height less than 100 . 

G Variate Injective Polynomials over Q from Merkle-Damgard. Given a 
bivariate injective polynomial over Q, it is possible to construct A variate injective 
polynomials over Q for every i > 2 using the Merkle-Damgard construction 
for collision-resistant hash functions [T5l22| . For example, applying one step of 
Merkle-Damgard to / ZAG shows that if / ZAG is injective then so is the following 
three- variate polynomial: 


9(x, y, z) = ( X 7 + 3 y 7 ) 7 + 3 z 7 . 

Injective Polynomials and Collision Resistance. Proposition[9]states that, 
for a polynomial / to be collision resistant over Zjv, / must be injective over 
the rationals. The following conjecture asserts the converse: injectivity over the 
rationals is sufficient for collision resistance. 

Conjecture 10. If f £ Z[aq, . . . ,xt\ is injective over Q then f is collision re- 
sistant. 

This conjecture is based on the intuition that the only efficient way to find 
collisions in / over Z^v is to find collisions in / over Q. Since collisions over Q 
do not exist it may be difficult to find collisions over Z^y. 

We only state Conjecture [10] to stimulate further research on this topic. The 
conjecture is not needed for this paper. For the applications described in this 
paper, we only need the collision resistance of an explicit low-degree polynomial 
in Z [x,y\. Nevertheless, if Conjecture [10] is true it would give a clean character- 
ization of collision resistant polynomials in terms of their arithmetic properties. 
For the applications in paper, the following assumption suffices. 


2 Gunther Corneliseen attributes to Don Zagier the suggestion that f(x,y) = x 7 + 3 y 7 
is collision-free over the rationals QJ Remarque 10]. 
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Assumption 11. The Zagier polynomial f ZAG (x,y) = x 7 + 3y 7 G Z [x,y] is colli- 
sion resistant. 

We see that breaking Assumption ITT1 would either: (a) resolve a 15-year open 
number theoretic problem by showing that / ZAG is non-injective, or (b) find Z^v 
collisions that are not rational collisions. We next review two potential avenues 
for attacks of type (b) and discuss why they do not apply. 

Attack Strategy I: Related Non-injective Polynomials over Q. One 

potential avenue for attacking the collision resistance of / ZAG in Zjv is to look for 
a polynomial h G Z [x,y\ such that 

9 0, y) : = y) + N • h(x, y) 

is not injective over Q. If (xo,yo) and (x\,yi) in Q 2 are a rational collision 
for g then by reducing this pair modulo N we mayj obtain a Z]v collision for 
f(x,y). We say that h is “useful” if there exists a rational collision for g(x,y) 
that gives a Zjv collision for f(x,y). It is easy to show that there are many 
useful polynomials h\ every Z^v collision for f(x,y ) gives a useful polynomial 
h. However, we do not know how to construct a useful h just given / and N . 
Furthermore, even if efficiently constructing a useful h is possible, the attack 
algorithm will need to find a rational collision on the resulting g and this may 
not be feasible in polynomial time. 

Attack Strategy II: Algebraic Extensions. Another avenue for attacking 
the collision resistance of / ZAG in Zjv is via algebraic extensions. Let g be an 
irreducible polynomial in 7L\x\ and consider the number field DC = Q[x]/(g). 
Suppose the adversary constructs g so that it knows an efficiently computable 
map p : DC Zjy (this can be done by choosing the polynomial g so that the 
adversary knows a zero of g in Zjv). Now, even if / ZAG is injective as a function 
Q 2 Q, it may not be injective as a function DC 2 — >> DC. For example, / ZAG is 
not injective over the extension DC = Q[v^3]: the points (v^3, 0) and (0, 1) are a 
collision. If the adversary could find a collision of / ZAG in DC 2 this collision may lead 
to a Zat collision for / ZAG . However, for a random RSA modulus TV, it is not known 
how to efficiently construct an extension DC such that (i) / ZAG : DC 2 — >■ DC is not 
injective, and (ii) the adversary has an efficiently computable map p : DC — )> Z^v- 

Assumption [Tl] merits further analysis and we hope that this work will stim- 
ulate further research on this question. 

Non-collision Resistant Polynomials. Simple variations of Zagier’s polyno- 
mial are trivially not injective and therefore not collision resistant. For example, 
the polynomials 

/i (x, y) = X 7 + y 7 and f 2 (x, y) = x 7 + 2 y 7 

3 If (xo,yo) and (a; 1,2/1) happens to reduce to the same point modulo N or if one of 

the denominators is not relatively prime to N then this rational collision for g does 

not give a Zjv collision for /. 
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in Z [x,y\ are not collision resistant. The polynomial /i is not injective because 
for all xo 7^ |/o in Z the points (#o,2/o) and (; yo,xo ) are a collision for /i. The 
polynomial is not collision resistant because for all t ^ 0 in Z the points 
(— £, 0) and (£, —t) are a collision for /2. 

Similarly, polynomials of the form /(x, y) = x ei -\-by e 2 G Z[x, y\ for some b G Z 
where gcd(ei,e2) = 1 are not injective and therefore not collision resistant. To 
see why observe that if the equation ae i — /3e 2 = 1 has integer solutions (<ao, fio) 
and (aq,/3i) then (T* 0 ,^ 1 ) and (t ai ,t^°) are a collision for /. 

Random Self-reduction. Finally, we mention that the collision finding prob- 
lem for the family of polynomials {x e + ay e } ae z N has a random self reduction. 
Given a collision- finding algorithm A(N, a) that outputs a Z^v collision in x e -\-ay e 
for a non- negligible fraction of choices of a G Z^v, it is possible to construct a 
collision- finding algorithm £>(7V, a) that finds collisions for every choice of a with 
high probability. On input (AT, a) Algorithm B chooses a random r <— Zjv, and 
calls A(N, r e a). When A outputs the collision (#o,2/o), (#1,2/1), algorithm B ob- 
tains the following collision on the original curve: (xo,n/o), (#i,n/i). If A fails 
then B can try again with a fresh random choice of a G Z^y. After an expected 
polynomial number of iterations algorithm B will find a collision for the given 
polynomial x e + ay e . 

4 A Nestable Commitment Scheme from Polynomials 
over Z tv 

Having argued that it is infeasible to find collisions in the function f ZAG (x,y) = 
x 7 + 3 y 1 mod N (Assumption |TTj) , we now turn to the cryptographic applications 
of this new computational assumption. In this section, we demonstrate that the 
collision-resistance of / ZAG leads to a commitment scheme where the procedure for 
verifying that a commitment was opened correctly uses only low-degree polyno- 
mials. The new commitment scheme is statistically hiding and its computational 
binding property is based on Assumption fTTl 

The commitment scheme composes naturally with zero-knowledge proofs of 
knowledge involving Pedersen commitments. In particular, given a Pedersen 
commitment C to one of our low-degree commitments, there is a succinct zero- 
knowledge protocol which proves knowledge of an opening of an opening of C. 
We call the inner commitment scheme nestable , since it can be efficiently nested 
inside of a Pedersen commitment. We discuss applications of nestable commit- 
ments in Sections 14.41 and l5l 


4.1 Commitments 

A commitment scheme is a tuple of efficient algorithms (Setup, Commit, Open), 
with the following functionalities: 

Setup(A) — »• pp. The Setup routine is a randomized algorithm that runs in time 
polynomial A and returns public parameters pp. These parameters define 
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a message space A4, a space of random blinding values 7Z, and a space of 
commitments C. The following algorithms take the public parameters pp as 
an implicit argument. 

Commit(m) (c, r). Given a message m G M, return a commitment c G C and 
a random blinding value r G 7Z used to open the commitment. 

Open(c, ra,r) — >> {0,1}. Given a commitment c, a message m, and a blinding 
value r, return “1” if (m, r ) is a valid opening of c and “0” otherwise. 

For correctness, we require that, for all m G M: 

Pr[pp G- Setup(A); (c, r) G- Commit (m) : Open(c, m, r) = 1] > 1 — negl(A). 

A statistically hiding commitment scheme must satisfy two security properties: 

— Statistically Hiding. For any two messages mo and mi in Ad, a commit- 
ment to mo is statistically indistinguishable from a commitment to m\. 

— Computationally Binding. For any p.p.t. adversary A, the adversary has 
negligible advantage in producing two different valid openings of the same 
commitment. More precisely, 

Pr[pp G- Setup(A); (c, ra, r, ra', /) G- A(pp) : 

Open (c, m, r) = 1 A Open(c, m', r') = 1 A (m, r ) ^ (m', r')\ < negl(A). 

4.2 Construction 

The public parameters for our new commitment scheme consist only of an RSA 
modulus TV, for which no one knows the factorization. To commit to a value m G 
the committer samples a random blinding value r from Z>* N and computes 
the value of / ZAG at the point (m, r). 

The construction of the new commitment scheme follows. 

Setup(A) N. The value N is an RSA modulus — the product of two random 
len(A)-bit primes p and q such that gcd(p — 1, q — 1,7) = 1. The commitment 
space C is Zjv- The message space M. and the space of blinding values 7 Z 
are Z* N . 

Commit(ra) — >> (c, r). Choose a random blinding value r Z* N and set c <(— 
m 7 + 3r 7 in Zat. Return r as the commitment secret. 

Open(c, 771,7'*) {0,1}. Output “1” if m,r G Z* N and if c = m 7 + 3r 7 in Z n. 

Output “0” otherwise. 

Security Properties. The following theorem summarizes the security properties 
of the scheme. 

Theorem 12. The commitment scheme is statistically hiding and computation- 
ally binding under Assumvtion[ll[ 

Proof. Statistical hiding follows from a standard argument given in Appendix [A] 
Computational binding follows directly from the collision resistance of / ZAG over 
Ztv- One issue is Setup algorithm generates a random N such that gcd (0(7V), 7) = 
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1 whereas Assumption [TT] imposes no such restriction on TV. Nevertheless, As- 
sumption [Tl] implies the collision resistance of / ZAG for this modified distribution 
of TV: By way of contradiction, assume there were an algorithm A which finds 
collisions in / ZAG with non-negligible probability e when gcd(0(TV),7) = 1. Since 
algorithm RSAgen in Assumption |TT] generates such TV with probability about 
(5/6) 2 = 25/36 it follows that A will find collisions in with probability at least 
(25/36)e when TV is sampled as in algorithm RSAgen, violating Assumption fill 

Efficiency. Generating and verifying standard Pedersen commitments requires 
two modular exponentiations (or elliptic curve scalar multiplications). In con- 
trast, our scheme requires only a few modular multiplications. On a workstation 
with a 3.20 GHz processor, for example, computing 10,000 Pedersen commit- 
ments in a subgroup of order « 2 256 modulo a 2048-bit prime takes 16.54 sec- 
onds. Computing the same number of commitments using this new scheme takes 
0.925 seconds — a factor of 17.9 x speed-up. 

4.3 Nestable Commitments 

We say that a commitment scheme (Setup, Commit, Open) is nestable if, given 
Pedersen commitments to a message m, randomness r, and a commitment c, 
there is an succinct zero-knowledge proof of knowledge of values m, r, and c, 
such that c = Comm it (ra, r). In other words, there is a succinct protocol for 
proving knowledge of an opening of an opening of a Pedersen commitment. For 
our purposes, a succinct zero-knowledge protocol is one in which proof length 
is k\c\ bits long, where k is a constant which does not depend on the security 
parameter. 

We adopt the notation of Camenisch and Stadler |9 for specifying zero- 
knowledge proof-of-knowledge protocols. For example, PoK{x, y : X = g x V T = 
g x } indicates a protocol in which the prover and verifier share public values g, 
X, and y, and the prover demonstrates knowledge of either a value x such that 
X — g x or a value y such that Y = g y . 

Given Pedersen commitments 

= g m h s ™ C r = g r h Sr C c = g c h s c 

a nestable commitment scheme has a succinct zero-knowledge protocol which 
proves knowledge of the statement: 

PoK {m, r, c, s m , s r , s c : C m = g m h s ™ A C r = g r h s ^ A C c = 

For the commitment scheme outlined above, Commit (m, r) = m 7 +3r 7 mod TV, 
so the proof of knowledge protocol is: 

PoK{m, r, c, s m , s r , s c : C m = g m h s ™ A C r = g r h s - A C c = 

The group G = (g) = (h) used for the proof must be a group of composite order 
TV, where TV is the RSA modulus used in the commitment scheme. As usual for 


54 


D. Boneh and H. Corrigan- Gibbs 


Pedersen commitments, no one should know the discrete logarithm log^ h in G. 
For example, G might be the order- TV subgroup of the group Z* for a prime 
p = 2kN + 1, where k is a small prime. Alternatively, G could be an elliptic 
curve group of order N. 

The fact that the verification equation for our commitment scheme is a fixed 
low-degree polynomial means that this proof can be executed succinctly using 
standard techniques m- This proof requires only one challenge and 20 elements 
of G. If N is a 2048-bit modulus, then the proof is roughly 5 KB in length. 

In contrast, nesting Pedersen commitments inside of other Pedersen com- 
mitments does not lead to succinct proofs of knowledge. The shortest proofs 
of knowledge for nested Pedersen commitments require a number of group ele- 
ments that is linear in the security parameter m Sec. 5.3.3], whereas our proof 
requires only a constant number of group elements. 

Being able to prove knowledge of an opening of a commitment which is itself 
nested inside of a commitment proves useful in constructing distributed e-cash 
schemes ( Section 14.4)) and set membership proofs (Section [5j). 

4.4 Application Sketch: Anonymous Bitcoins 

The Zerocoin scheme for anonymizing Bitcoin transactions requires a proof of 
knowledge of an opening of an opening of a commitment [24]. For this purpose, 
Zerocoin uses Pedersen commitments nested inside of Pedersen commitments, 
which requires a proof-of-knowledge of the form: PoK{m,r, s : c = g^^h 8 }. 
The number of group elements exchanged in this proof is linear in the security 
parameter, since the proof uses single-bit challenges. 

By using our nestable commitment scheme for the “inner” commitment, we 
reduce the number of group elements from linear to constant in the security 
parameter. This reduces the length of anonymous coin transactions in the Ze- 
rocoin scheme by roughly 70% (down to 12.0 KiB from 39.4 KiB when using 
a 2048-bit RSA modulus). When instantiated with our nestable commitments, 
Zerocoin maintains its unconditional privacy property and maintains double- 
spending prevention under Assumption [TlJ 

5 Succinct Set Membership Proofs 

A cryptographic accumulator , first defined by Benaloh and De Mare [5] , is a prim- 
itive which allows a prover to accumulate large set of values S = {aq, . . . ,x n } 
into a single short value A. For every value Xi in the accumulator, there is an 
accompanying short witness W{. By exhibiting a valid (aq,i*q) pair, a prover can 
convince a verifier that the value Xi was actually accumulated into A. Informally, 
the security property of the accumulator requires that it be difficult to find a 
valid value- witness pair (x*,rc*) such that x* £ S. 

Benaloh and De Mare give one example application of this primitive: the ad- 
ministrator of a club can accumulate the names of the members of the club into 
an accumulator A , distribute a witness to each member, and publish the accumu- 
lator value A. The value A is a concise representation of the club’s membership 
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list. A person can prove membership in the club by revealing her name Xi and 
the witness Wi to a verifier. 

Camenisch-Lysyanskaya extend the basic accumulator primitive to allow for 
zero-knowledge proofs of accumulator membership [8]. That is, a prover can 
convince a verifier that the prover “knows” a valid value- witness pair (x, w) for 
a particular accumulator A, without revealing x or w. This augmented primitive 
allows for privacy-preserving authentication: a club member can prove that she 
is some member of the club defined by a membership list A without revealing 
which member she is. 

We provide a construction that offers the same functionality as the Camenisch- 
Lysyanskaya scheme with the cost of requiring slightly larger proofs — of length 
0(log | S\) instead of length 0(1). The benefit of our construction is its simplic- 
ity: compared with the Camenisch-Lysyanskaya proof, which requires a nuanced 
security analysis, ours is relatively straightforward. 

5.1 Definitions 

A cryptographic accumulator is a tuple of algorithms (Setup, Accumulate, Witness, 
Verify) with the following functionalities: 

Setup(A) pp. Given a security parameter A as input, output the public pa- 
rameters pp. The other functions take pp as an implicit input. Setup runs 
in time polynomial in A. 

Accumulate^ = {oq, . . . , x n }) — >• A. Accumulate the n items in the set S into 
an accumulator value A. 

Witness^, x) — » w or _L. If x ^ S', return _L. Otherwise, return a witness w that 
x was accumulated in Accumulate(S'). To be useful, the length of w should 
be short (constant or logarithmic) in the size of S. 

Verify (A,x,w) —> {0,1}. Return “1” if the value-witness pair (x,w) is valid for 
the accumulator A. Return “0” otherwise. 

Camenisch and Lysyanskaya, following Baric and Pfitzmann [4], define an 
accumulator as secure , if for all polynomial-time adversaries A: 

Pr[pp <- Setup(A); (S,x*,w*) <- *4(pp); x* £ 5; 

A <— Accumulate(S') : Verify ( A, x*, w*) = 1] < negl(A). 

If an accumulator satisfies this definition, then it is infeasible for an adversary 
to prove that a value x was accumulated in a value A if it was not. 

Zero-Knowledge Proof of Knowledge of an Accumulated Value. In many appli- 
cations, it is useful for a prover to be able to convince a verifier that the prover 
knows some value inside of an accumulator without revealing which value the 
prover knows. Such a proof protocol should satisfy the standard properties of 
soundness, completeness, and zero-knowledgeness QT1 Sec. 2.9]. Camenisch and 
Lysyanskaya construct one such proof-of-knowledge protocol for the strong- RSA 
accumulator |8] and we exhibit a protocol for a Merkle-tree-style accumulator in 
Section 15.31 
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5.2 Construction 

Given a collision-resistant hash function H : D x D D, which operates on 
a domain D such that S C D, it is possible to construct a simple accumulator 
using Merkle trees. For example, given a set S = {xi, X2, X3, X4}, the accumulator 
value A is the value A H{H(pc 1 , £ 2 ), H{pc 3 , 24 )). A witness that an element 
xi is in the accumulator is the set of 0(log|S|) nodes along the Merkle tree 
needed to verify a path from xi to the root (labeled A). 

The limitation of this accumulator construction is that it no longer admits 
simple zero-knowledge proofs of knowledge of (x, w) pairs, unless H has a very 
special form. For instance, if H is a standard cryptographic hash function (e.g., 
SHA-256), there is no straightforward zero-knowledge protocol for proving knowl- 
edge in zero knowledge of a preimage under H. By instantiating H with the 
function H(x, y) = x 7 + 3 r 7 mod TV, as we demonstrate in the following section, 
it is possible to execute this zero-knowledge proof succinctly. 



Fig. 1. A perfect Merkle tree with eight leaves rooted at A. The shaded nodes are a 
witness to the fact that m 2 is accumulated in A. The tree invariant is a* = H(aw, an). 


We first recall the standard construction of Merkle trees [21] and then describe 
the zero-knowledge proof construction. The construction from a general collision- 
resistant hash function family {Px}^ =1 follows. 

Setup(A) H. Given a security parameter A as input, sample a A-secure 
collision-resistant hash function H from H\. Setup runs in time polynomial 
in A. 

Accumulate^ = {xi, . . . , x n }) — >> A. If \S\ is not a power of two, insert “dummy” 
elements into S (e.g., by duplicating the first element of S) until |S| is a 
power of two. Construct a perfect Merkle tree of depth d = log 2 \S\ using 
the hash function H with the members of S as its leaves and return the 
root A. Figure [l] depicts an example tree of depth three. 

Witness^, x) — >> w or _L. If x ^ 5, return _L. Otherwise, let the path from A to 
the message x be: P = (A, a^, . . . , where a * 0 is the 

left child of node a*, an is the right child of node a*, and d is the number of 
edges between the root and leaf labeled x in the tree. The first component 
of the witness is the list of siblings of the nodes in the path P: w a = 
(a^, a hl i 2 , a blb2 i 3 , . . . , a bl ..x d )- The second component of the witness is a bit 
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vector indicating where x is located in the tree: wp = (&i, • • • > ^d- 1 , frd)- 

The witness is w = (w a ,wp). 

Verify (A,x,w) — )> {0,1}. Interpret the witness as (w a ,wp) such that w a = 
(wi , . . . , Wd) and wp = ( 61 , . . . , b d )- To verify the witness, let t d — x and 
recompute the intermediate nodes of the tree from the leaf back to the root. 
Specifically, compute test nodes ti for i = d — 1 , . . . , 0 : 


f i, Wi- (_i) 

\ i, tip. i) 


: if bi = 0 
: if bi = 1 


U = 


Return “1” if A = to and “0” otherwise. 

5.3 Proof of Knowledge of an Accumulated Value 

When instantiated with a general hash function H, the Merkle-tree accumu- 
lator of the prior section does not admit a succinct proof of knowledge of an 
accumulated value. When instantiated with our new hash function H(x,y) = 
x 7 + 3 y 7 mod V, however, there is a succinct proof of knowledge that the prover 
knows an opening of a Pedersen commitment C m such that some leaf of the 
accumulator Merkle tree has label m. The proof requires a group G = (g) = ( h ) 
of order V, as in Section lT3l The proof length is log | S |, for a set S of elements 
accumulated. 

The Setup algorithm outputs an RSA modulus N <— RSAgen(A) such that 
gcd (</>(7V), 7) = 1 and such that no one knows the factorization of N. The hash 
function H is H(x , y) = x 7 + 3 y 7 mod N and the accumulator domain D is Z* N . 

The high-level idea is that, if the prover wants to convince the verifier that a 
particular value m is accumulated in A, the prover commits to the values of all of 
the nodes in the Merkle tree along the path from the root to the leaf labeled m. 
The prover also commits to all of the witness values needed to recreate the path 
from the leaf labeled m down to the tree root. The prover can then convince the 
verifier in zero knowledge that these commitments together contain a path to 
some leaf in the tree, without revealing which one. 

Assume that the prover has a value- witness pair (x, w) which convinces a 
verifier that x is accumulated in A. Denote the node values along the path from 
the root node, with value A, to the leaf node, with value x, in the Merkle tree 
as: p = (pojPi? • • • ,Pd)- Note that po = A and p d = x. 

The prover now commits to every value Pi in this path and to the values of 
the left and right children of pi in the Merkle tree. If the value of the left child 
is £i and the right child is r^, the commitments are, for i =» b, . . . , d — 1 : 


Pi = g Pi h Si L i = g ii h< Ri=g r 'h s ' 


,// 


The prover opens Pq by publishing (po, so) an d the verifier ensures that po = A 
and that Pq = g Po h s °. 

The prover now can prove, for i = 0, . . . , d — 1, that each (. Pi , Li, Ri) tuple is 
well-formed using a standard discrete logarithm proof: 
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The prover then must prove that it knows an opening of the commitment Pi+l 
such that the opening is equal to an opening of either Li or Ri. For i = 0, . . . , d— 
1 , the prover proves: 

PoK^p, 5, S£, s r : P i+ 1 = g p h s A (Li = g p h Sl V Ri = g p h Sr )}. 

The complete proof is the set of commitment pairs {(Pi, Li, Ri)}f =0 , the 2d 
proofs of knowledge, and the opening (po,ro) of the root commitment Po- The 
total length is 0(d) = 0 (log \S\), since the tree has depth d = log \ S\ and each 
of the elements of the proof has length which is constant in \S\. 

Security. The completeness and zero-knowledgeness properties follows from the 
properties of the underlying zero-knowledge proofs used and from the fact that 
Pedersen commitments are perfectly hiding. 

To show soundness, we must demonstrate that if the verifier accepts, it can ex- 
tract a value- witness pair (x* , w*) for the original Merkle tree with non- negligible 
probability by rewinding the prover. Starting at the root and working towards 
the leaves of the tree, we will be able to extract the prover’s witness for each of 
the proofs of knowledge with non-negligible probability. 

By induction on i, we can show that after d steps, the verifier will be able 
to extract the value- witness pair (x,w). The base case of the induction is i = 0 
and the verifier can extract a preimage of A under H. From each of the i PoK^s, 
the verifier extracts an element of the witness w a (the preimage of pi under H). 
From each of the i P 0 K/ 3 S, the verifier extracts an element of the witness wp 
(whether the next node in the path is the left or right child of pi). 

6 Claw-Free Functions, Signatures, and Chameleon 
Hashes 

In this section, we describe a few other applications arising from the assumed 
collision-freeness of the Zagier polynomial. 

Claw-Free Functions and Signatures. Assumption [ 11 ] immediately gives rise to 
a family of trapdoor claw-free functions [T4| . For each RS A modulus N selected 
as in Section 101 we can define a function family: 

F N := {f a | a G where f a (x) = x 7 + 3a 7 mod N. 

Following Damgard [14], a function family Fn is claw free if, given Fn, it 
is difficult to find a “claw” (x,y,a,b) such that f a (x) = fb(y ). For all p.p.t. 
adversaries A, we require that: 

Pr [ N <r- RSAgen(A), (x, y, a, b) <- A(N) : f a (x) = f b (y)]< negl(A). 

The claw- freeness of Fn follows from Assumption [Til since a claw in Fn implies 
a collision in f(x,y) = x 7 + 3 y 7 mod N. Additionally, the function family Fn 
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is trapdoor claw-free, since anyone with knowledge of the factors of N can find 
claws easily by choosing (x, y, a) arbitrarily and solving for b. 

This family Hn is not quite a family of trapdoor claw- free permutations, 
since the range of two functions f a and /& in Hn are not necessarily equal (i.e. , 
is sometimes undefined). However, the fraction of choices of (a, 6, x) 
for which this event occurs is negligible, so it is possible to treat Hn as if it 
were a family of trapdoor claw- free permutations. In particular, this function 
family leads to a signature scheme secure against adaptive chosen message at- 
tacks in the standard model by way of the Goldwasser-Micali-Rivest signature 
construction m- 

Chameleon Hash. This commitment scheme immediately gives rise to a new 
chameleon hash function. A chameleon hash, as defined by Krawczyk and Rabin, 
is a public hash function H(m,r) with a secret “trapdoor” [20]. A chameleon 
hash function has three properties: 

1. Without the trapdoor, it is difficult to find collisions in H . That is, it is hard 

to find colliding pairs (m,r) and such that H(m,r) = 

2. Given the trapdoor, there is an efficient algorithm which takes (m,r, m') as 
input and outputs a value r' such that H(m,r) = 

3. For any pair of messages m and m' in the message space A4, the distributions 
H(m , r) and H(m ' , r') are statistically close if r and r' are chosen at random. 

Chameleon hashes are useful in building secure signature schemes in the standard 
model m and for a number of other applications [20] . 

To derive a chameleon hash scheme from our commitment scheme, set the 
public key to the RSA modulus TV, and the secret key to the factorization of TV. 
The hash function H is then H(m,r) = m 7 + 3 r 7 mod TV. Without the factors 
of AT, it is difficult to find collisions but anyone with knowledge of the factors of 
TV (the “trapdoor”) can find collisions. 

Chameleon hashes based on Pedersen commitments require two modular ex- 
ponentiations to evaluate, while ours requires just a few modular multiplications. 

7 Conclusion and Future Work 

We have used arithmetic properties of bivariate polynomials over Q to reason 
about their cryptographic properties in the ring Z^y. Using one particular low- 
degree polynomial, / ZAG , we build a new statistically hiding commitment scheme, 
a conceptually simple cryptographic accumulator, and a computationally effi- 
cient chameleon hash function. To gain confidence in Conjecture [10] it would be 
interesting to prove it in the generic ring model [I]. We leave that for future 
work. 
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A Proof of Statistical Hiding 

This appendix presents a proof that the commitment scheme of Section 14.21 is 
statistically hiding. To demonstrate that the statistical hiding property holds, we 
show that for any message m G Z^ , the distribution of the value of a commitment 
c to m is statistically close to uniform. 

The commitment c is generated by sampling a random value r <— r %* n and 
letting c <— m 7 + 3r 7 . Since r G Z^-, and since gcd(7, </>(7V)) = 1, the RSA 
function f(x ) = x 7 mod N defines a permutation on Z)y. Thus, there are exactly 
|Z)vl = 0(A) possible commitments to m, and each of these values occurs with 
equal probability. 
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Let the random variable C take on the value of the commitment to m and let 
U be a random variable uniformly distributed over Z tv- Then: 


Pr[C = c 0 ] = 




p i[U = Co] = w 


The statistical distance between these distributions is: 

A{C,U)m\ Y. \Pv[C = c 0 }-Pv[U = c 0 


1 ^ \N-<f>(N)\_ (p + q- 1) 




N<j>(N) 


< negl(A). 
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Abstract. In this paper we pick up an old challenge to design public 
key or white-box constructions from symmetric cipher components. We 
design several encryption schemes based on the ASASA structure ranging 
from fast and generic symmetric ciphers to compact public key and white- 
box constructions based on generic affine transformations combined with 
specially designed low degree non-linear layers. While explaining our de- 
sign process we show several instructive attacks on the weaker variants 
of our scheme^]. 

Keywords: ASASA, multivariate cryptography, white-box cryptogra- 
phy, cryptanalysis, algebraic, symmetric. 


1 Introduction 

Since the development of public key cryptography in the late 1970’s it has been 
an open challenge to diversify the set of problems on which such primitives were 
built as well as to find faster alternatives, since most public key schemes were 
several orders of magnitude slower than symmetric ones. One of the directions 
was to design public key schemes from symmetric components. As public key 
encryption requires trapdoors, they have been hidden in secret affine layers [39] , 
field representations [43] , biased S-boxes and round functions 05]; however most 
of these schemes were broken 05B2] . We recall that a typical symmetric cipher is 
built from layers of affine transformations (A) and S-boxes (S), a design principle 
dating back to Shannon. It is thus natural to see what designs can be made from 
such components. Whereas the classical cipher AES-128 consists of 10 rounds 
with 19 layers in total, it is striking that a lot of effort has been put into designing 
public-key schemes with only 3 layers, using the ASA (affine-substitution-affine) 
structure. This has indeed been the mainstream of what is known as multivariate 
cryptography. However, in this case, the non-linear layer is usually an ad-hoc 
monolithic function over the full state, as opposed to an array of independent 
S-boxes. 

1 The full version of our paper is available at Eprint 0. 

P. Sarkar and T. Iwata (Eds.): ASIACRYPT 2014, PART I, LNCS 8873, pp. 63 [Ml 2014. 

(c) International Association for Cryptologic Research 2014 
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It has been known that the scheme SASAS with two affine and three nonlin- 
ear layers is vulnerable to a structural attack if the nonlinear layer consists of 
several independent S-boxes m- The scheme ASA, though secure for a random 
monolithic S-box, has been shown weak in concrete multivariate proposals. In 
the seemingly unrelated area of white-box cryptography the ASA approach to 
build obfuscated lookup tables failed multiple times. This suggests exploring the 
shortest scheme unbroken so far — the ASASA construction with injective S- 
boxes — in the application to symmetric (black-box) , public-key, and white-box 
cryptography. Let us overview the related areas. 


Retrospective of Multivariate Cryptography 

The idea of multivariate cryptography dates back to the Shannon’s idea that 
recovering the secrets in any cryptographic scheme could be reduced to solving 
particular systems of (boolean) equations. Since nearly all forms of cryptology 
implicitly rely on the hardness of solving some kind of equation systems, then 
it must be possible to design cryptographic schemes that explicitly rely on the 
hardness of this problem. In multivariate public-key schemes, the public-key 
itself is a system of polynomial equations in several variables. It is well-known 
that solving such systems is NP-hard, even when the polynomials are quadratic 
(hence the name of the MQ problem, which stands for Multivariate Quadratic 
polynomial systems). An additional advantage of the MQ cryptosystems is that 
they seem invulnerable to quantum algorithms and hence are candidates for 
Post- Quantum cryptography. 

Multivariate polynomials have been used in cryptography in the early 1980’s 
with the purpose of designing RSA variants with faster decryption. At this time, 
Imai and Matsumoto designed the first public-key scheme explicitly based on 
the hardness of MQ. It made it to the general crypto community a few years 
later under the name C* [39 b 

Several years later, in 1995, Patarin [42] found a devastating attack against 
C*, allowing to decrypt and to forge signatures very efficiently. Thereafter many 
multivariate scheme have been proposed (we counted at least 20 of them), includ- 
ing a plethora of bogus and vainly complicated proposal with a short lifespan. A 
few constructions stood out and received more attention than the others because 
of their simplicity and their elegance, such as HFE [43] and UOV [34]. 

However, the practical break of the first HFE challenge, supposed to offer 80 
bits of security, in 2003 [29], and the demise of SFLASH in 2007 |_24], just after 
the NESSIE consortium proposed it to be standardized, shattered the hopes and 
trust of the cryptographic community at large in multivariate cryptography. This 
brought the multivariate fashion to a stop. 

The main problem in multivariate crypto is that the selection of candidates 
for the nonlinear layer S is scarce (we will discuss this in Section 0]). What 
remains usually has so strong a structure within, that it can be detected and 
exploited even in the presence of unknown A layers. A very recent example is the 
promising matrix-based scheme ABC [48]. In the last years, a few researchers 
started designing public-key schemes based on the hardness of random instances 
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of the MQ problem [46], though no drop-in replacement for conventional public- 
key encryption schemes has been proposed. Still, they are promising because 
there is a concensus that random instances are hard, and all known algorithms 
are exponential and impractical on random systems. 

This overview clearly indicates the need of a larger structure for multivariate 
cryptosystems, and suggests truly random polynomials in this context, which we 
use in our schemes. 


Retrospective of White-Box Cryptography 

In a parallel development a notion of white-box cryptography (WBC) has been in- 
troduced in m- The initial motivation was to embed symmetric secret keys into 
the implementation of popular standards like AES or DES in a way that binds 
the attacker to the specific implementation for DRM purposes. Several propos- 
als have been made nana with the main idea to obfuscate key-dependent parts 
of the cipher and publish them as lookup tables, so that the entire encryption 
routine becomes just a sequence of table lookups. The obfuscation constitutes of 
wrapping the nonlinear transformation (S) with random affine transformations 
(A) so that the affine layers would cancel each other after composition. 

As a result, the lookup tables are just instantiations of the ASA structure. 
Moreover, since the nonlinear layers of AES and DES consist of independent 
S-boxes, the resulting ASA structure is very weak and can be attacked by a 
number of methods [7 . As demonstrated by Biryukov and Shamir m , even 
as large structure as SASAS is weak if the S-layers consist of smaller S-boxes. 
Surprisingly overlooked by the designers of white-box schemes, the generic attack 
[TO] exploits multiset and differential properties of SASAS and applies to all the 
published white-box proposals so far. It appears that the mainstream ciphers are 
just poor choice for white-box implementations due to high diffusion properties 
and the way how the key is injected. 

To formalize the problem, two notions have been suggested I37E51 - The weak 
white-box implementation of a cryptographic primitive protects the key and 
its derivatives i.e. aims to prevent the key-recovery attack. This ensures that 
unauthorized users can not obtain any compact information (e.g. the key or the 
set of subkeys) to decrypt the protected content. 

The strong white-box implementation of a primitive protects from the 
plaintext-recovery attack , i.e. does not allow to decrypt given the encryption 
routine with the embedded key. Such an implementation may replace the public- 
key cryptosystems in many applications, in particular if it is based on an existing 
symmetric cipher and is reasonably fast for a legitimate user. The existing white- 
box implementations of AES and DES do not comply with in this notion, 

since they are easily invertible, which is strikingly different from the black-box 
implementations of these ciphers. So far the only proposed candidate is the 
pairing-based obfuscator scheme with poor performance [47] . 

The ASASA-based designs may not only hide the key for the weak white-box 
implementation, but also provide non-invertibility aiming for the strong white- 
box construction. 
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Our Contributions 

We continue to explore the design space of compact schemes built from layers of 
affine mappings and S-boxes. We first note that there is no known generic attack 
on the 5-layered ASASA scheme with injective S-boxes in the flavour of m , which 
makes the ASASA structure a promising framework for future white-box, black- 
box, and public-key schemes. Based on this principle, we propose and analyze 
the following constructions in this paper: 

• Two public-key / strong white-box variants of the ASASA symmetric scheme: 
one is based on Daemen’s quadratic S-boxes |19j (previously used in various 
hash functions) and another based on random expanding S-boxes. 
(Section [2j). We explore standard cryptanalytic attacks such as differential, 
linear and others, the recent decomposition attacks [50], and a new interpola- 
tion attack on weakened variants of our schemes (Section[3J). We demonstrate 
that our set of parameters offers a comfortable security margin with respect 
to the existing attacks. 

• A concrete instantiation for a fast symmetric ASASA-based blockcipher with 
secret S-boxes and affine layers and comparable with AES in it’s encryp- 
tion/decryption speed (Section [4]). 

• A concept of memory-hard white-box implementation for a symmetric block- 
cipher and a concrete family of ciphers with tunable memory requirements 
(Section [5]). It prevents key recovery and requires the adversary to share 
the entire set of lookup tables to allow an unauthorized user to decrypt. 
Therefore, the cipher solves the problem of weak white-box implementation. 

Due to the space limits, some references and attacks on the weakened variants 
of our schemes are not present in this paper and are available at [8] . 

2 Asymmetric ASASA Schemes: Strong White-Box and 
Public-Key 

The first ASASA cryptosystem, designed by Patarin and Goubin, was a public- 
key scheme with non-bijective S-boxes and was easily broken by Biham, ex- 
ploiting this property in [5]. Shortly afterwards, Biryukov and Shamir explored 
multi-layer schemes with bijective S-boxes and demonstrated a generic attack on 
the structure SASAS with two affine layers D33- The outer S-boxes are recovered 
with a variant of the Square attack, whereas the inner affine layers are peeled off 
with linear algebra methods. It was clearly demonstrated that these properties 
disappear in larger schemes, and no attack on ASASA or other larger structures 
has been proposed since. 


2.1 Strong White-Box Security 

We start with the notion of the strong white-box security that summarizes the 
discussion in [49] . 
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Definition 1 . Let the pair of algorithms (E,D) be a private-key encryption 
scheme, which takes key K as parameter. Let Oe k be a function that computes 
Ek • We say that Oe k is a secure strong white-box implementation for Ek if it 
is computationally hard to obtain V' equivalent to Dk given full access to Oe k - 

In other words, an adversary should be unable to decrypt given the white-box 
implementation Oe k of Ek • This notion closely resembles the definition of a 
trapdoor permutation used to construct a public-key encryption scheme. As we 
see, our asymmetric proposals are suitable for both notions. 

2.2 Outline 

We propose several asymmetric instantiations of the ASASA structure, which 
may serve both in the white-box and public-key setting. We have not found 
any reasonable use for lookup tables in this frameworlo and hence look for 
polynomial-based S-boxes. In order to keep the reasonable size of the descrip- 
tion, we restrict to polynomials of degree two over some finite field, so that the 
resulting scheme has degree four. This approach brings us to the area of mul- 
tivariate cryptography , which aims to design cryptographic primitives based on 
multivariate polynomials over finite field. 

Let us introduce the following notations. 

The public key /white-box implementation is 
exposed as a set of polynomials b, which is 
constructed out of the following composition: 

(i) 

where ai,a 2 are nonlinear transformations, 
and U,T,S are affine transformations. 

There have been many proposals for non- 
linear layers in the ASA structure, and vari- 
ous attacks exploited these choices. Most at- 
tacks are not evidently translated into degree 
4, as they compute, e.g., differentials of the 
public key, which are linear functions the ASA 
case. The notable exception is the decomposition attack mu, that will be dis- 
cussed in Section [2731 

We offer two fresh ideas for the nonlinear layers in ASASA. The first candidate 
is the so called x-function. It derives from invertible cellular automata and was 
brought into symmetric cryptography by Daemen. To the best of our knowledge, 
it has never been used in multivariate cryptography. 

The second candidate is a set of random injective S-boxes of degree 2. Since 
the families of low degree permutations are small and do not absorb much ran- 
domness, we propose to use expanding S-boxes, which can be key-dependent. 

2 So far all attempts to hide a trapdoor in lookup table-based designs failed. We 
investigated this problem and conjecture that such scheme just does not exist, at 
least given the state-of-the-art in the design of preimage-resistant functions. 


b = U o a2 o T o ai o <S, 


affine 
nonlinear 
affine 
nonlinear 
affine 

Fig. 1 . The ASASA structure: two 
nonlinear layers surrounded by 
affine layers 
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Having the expansion rate of 2, it is rather easy to obtain injective transforma- 
tions and still keep them quadratic!!. 

Limitations for expanding schemes. Whatever construction is used, an expand- 
ing scheme has a clear limitation in the public-key and white-box setting. It 
implies that only a tiny subset of potential ciphertexts is decryptable, which 
makes the encryption and decryption process non-interchangeable. As a result, 
the expanding scheme can be used for encryption only and can not produce sig- 
natures. Also in the white-box context, it can not be used for decrypting the 
content. On the other hand, it can still be used to ensure tamper-resistance of 
software [41 . 

2.3 Defeating Decomposition Algorithm with Perturbations 

The authors of recently published decomposition algorithms [30] l3T ] claim to 
break AS AS A schemes with quadratic nonlinear layers with complexity 0(n 9 ), 
where n is the number of variables. The decomposition problem is formulated 
as follows: given a set of polynomials h = (hi,...,h u ) over polynomial ring 
K[aq, . . . , x n \ (K denoting an arbitrary field) find any / = (/i, . . • , f u ) and g = 
(#i, . . . , g n ) over K[aq, . . . , x n \ whose composition is equal to h : 

h = (hi , . . . , hu) = (/i (gi , . . . , gn)i • • • 5 fu{d 1 5 • • • 5 9 n))- 

and their degree being smaller than h. 

In the context of the AS AS A structure with quadratic S-boxes, the sets / and 
g, that are produced by a decomposition algorithm, are linearly equivalent to the 
internal ASA structures. This does not fully constitute a break, since the adver- 
sary still needs to invert both ASA constructions. The proposed algorithms also 
have not been applied to the parameters and fields that we choose. Nevertheless, 
it is desirable to find some countermeasure. 

Our idea is to introduce some perturbation just after the second S layer in the 
form of several key-dependent secret polynomials of degree 4. A similar approach 
has been used by Ding in his modification of C* [21] and HFE [22]. In some cases 
(notably HFE), the “perturbation” would be identified and removed [25] . thanks 
to a differential attack exploiting properties of the non-linear transformations. 
The use of perturbation polynomials has been also linked to the LWE (Learning 
with Error) framework in [32], but the full application of LWE to multivariate 
cryptography is still to be explored in the future. 

Denoting the perturbation polynomials as another nonlinear transformation 
a p we obtain the modified public key b p : 

b p =Wo[a p + (a 2 oToaio «S)], (2) 


so 

b p (x) = b(x) +U a.p(x). 


Our experiments show that S-boxes with an even smaller rate of 1.75 can be found. 
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Hence the perturbation 
polynomials are mixed by 
the last affine transforma- 
tion and spread over the 
public key. The encryption 
process remains exactly the 
same, while for decryption 
we have to guess the values 
of these polynomials. Sup- 
pose that we work over F 2 
so that a p is sparse and con- 
tains only w polynomials. 

Let each polynomial be non- 
zero in q • 2 n points. Then 
the noise on average consists 
of qw bit flips, and we guess 
their positions after about (J^) attempts. For instance, a p with 8 non-zero poly- 
nomials of weight ~ 2 n_1 requires 2 6 trial decryptions on average. 

We distinguish true plaintexts from false ones either by recomputing the per- 
turbation polynomials or by using expanding S-boxes so that noisy bits prohibit 
inversion. Padding the plaintexts with zero bits also helps but disallows turning 
encryption to decryption. The position of noisy bits does not matter much, since 
it would be concealed by the affine transformation. However, if we filter out noise 
with expanding S-boxes, it makes sense to spread the noisy bits so that an S-box 
can still be inverted in the presence of noise. 

2.4 %-Scheme 

Our first idea was to build the nonlinear transformation out of a popular quadratic 
S-box % [I9j Section 6.6.2] , which has been used in several hash functions including 
SHA-3/Keccak [4]. The transformation x can be defined for every odd length k = 
2t + 1 and has the following features: 

— It has degree 2 in the forward direction, but degree (t + 1) in the backward 
direction. 

— It can be efficiently inverted for every size [8]. 

— Its differential and linear properties have been widely studied m- 

The S-box x of length k is defined as follows: 


degree d 


degree d 



S sparse, degree 2d 


Fig. 2. Small perturbations to defeat decomposition 
attacks as injection of sparse high-degree polynomials 


x{xo,X!,x 2 , . . .,x k - 1) = (2/0, 2/1, 2/2, • • • ,2/fc-i), 


where 

Vi = Xi® X i+1 Xi+2 0 Xi-\-2 , 

and indices are computed modulo k. 

Regardless of the x length (and hence the size of the S-box), we can formulate 
properties of the whole scheme and its features: 
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1. For the standard block size of 128 bits, we get approximately (since the S-box 
size might not divide 128) 2 7 input variables. Thus each output coordinate of 
b is a polynomial of degree 4 of 2 7 variables, has about ( 2 4 ) « 2 24 5 terms, so 
the full scheme description is about 2 24,5+7_3 = 2 28 5 bytes, or 300 MBytes. 

2. The private key size is much more compact, and is dominated by three ma- 
trices with 2 14 bits each (hence 2 13 bytes in total). If the matrices are deter- 
ministically produced out of some secret key (e.g., 128-bit), the description 
is even smaller. 

3. The inverse polynomial of our schemes has degree (£ + l) 2 for S-boxes of size 
2t T 1. 

The S-box size (length of x) has negligible effect on the performance because 
of the internal structure of x and its inverse. Hence it only affects the security 
of the scheme. We choose a single S-box a of length 127, so that its inverse has 
degree 64, and will show later that a system with small S-boxes is insecure. 

In order to defeat decomposition algorithms and hide the ASASA structure we 
suggest using perturbation polynomials. More precisely, we propose 24 random 
polynomials of degree 4 for the perturbation layer a p . We pad each plaintext with 
8 zero bits, so that for each guess the probability to fit the padding is 2 -8 . As a 
result, we get 2 16 candidate plaintexts, and then check if we correctly computed 
the noise. This filters out all wrong plaintexts with probability 1 — 2 -8 . 

Overall security. We have found a number of attacks on the x-scheme in different 
variants (Section[3j), so it appears that its algebraic structure yields it vulnerable. 
Nevertheless, the variant with added perturbation remains unbroken, and we 
offer it as crypt analytic challenge, but not for the practical use. We expect the 
perturbation theory to develop in the near future, which would suggest a more 
secure set of parameters. 


2.5 Scheme with Expanding S-Boxes 

This variant provides a more compact description of the scheme since we may 
switch to a larger field. First, we want the nonlinear layer be a degree-2 poly- 
nomial over F q ,q > 2, and define the linear (affine) transformations over the 
same field. Though a few examples of bijective transformations of degree 2 over 
a field not equal to F 2 exist [36], they appear to be vulnerable to Groebner basis 
attacks in our own experiments. As a solution, we suggest expanding S-boxes, 
whose output is twice as big as the input. It is rather easy to design injective S- 
boxes of degree 2 with this property. Indeed, a random function with expansion 
rate 2 has no collisions with probability around 1/2, and hence there are enough 
injective transformations of the desired form. 

Here is the summary of the scheme: 

— Input length 128 bits (32 variables), output length 512 bits (128 variables); 

— All polynomials and affine transformations are defined over Fi6; 
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— S-boxes map 16 bits to 32 bits and hence are described by 8 degree-2 polyno- 
mials over Fi 6 of four variables. The inverse is computed with lookup tables 
of size 2 16 . 

— The first nonlinear layer has 8 S-boxes and doubles the state size to 256 bits. 
The second layer has 16 S-boxes and further doubles to 512 bits. Accordingly, 
the affine transformations S', T, U operate on 128-, 256-, and 512-bit states, 
respectively. 

The output of the scheme is a set of 2 7 degree-4 polynomials over Fi6 over 
32 input variables (each variable is encoded with 4 bits). There are ( 2 4 ) ~ 2 16,5 
possible terms, hence, taking 4-bit constants into account, each polynomial is 
described by 2 20,5 bits, or 2 20,5+7_3 = 2 24,5 bytes, which is about 24 MBytes. 

The private key is smaller: affine layers contain 2 7+7+1 + 2 6+6+1 + 2 5+5+1 « 
2 14-2 elements of Fi6. The 48 S-boxes are described as 2 5,5+3 polynomials of 
21 « 2 4,5 terms each, hence 2 13 elements, plus a few noise polynomials. In total, 
the private key fits into 2 14 bytes. 

We also suggest using perturbation polynomials here. Due to the large ex- 
pansion rate, we can use rather dense perturbation layer a p and still ensure a 
unique decryption. We use two random polynomials over Fi6 of degree four at 
each S-box, hence 32 polynomials in total. While decrypting we face 16 2 = 2 8 
options for each S-box output. As a result, the probability of having non-unique 
decryption of the last S layer is 2 4 • 2 16+8-32 = 2 -4 , and if this happens the next 
layer filters out wrong candidates. 

As we already mentioned, the expanding character of the scheme allows only 
public-key and white-box encryption, but not signature generation. 


3 Security Analysis of Our White-Box/Public-Key 
Schemes 

In this section we apply various attacks to weakened versions of our schemes, 
thus demonstrating the design rationale behind them. We demonstrate that the 
added perturbations are crucial in both schemes, and that they must be secret. 
We also show that S-boxes in the x-scheme must be large, that linearity (in 
contrast to affinity) of A may weaken the scheme, and that the expanding S- 
boxes should not be biased (these results are presented mainly in 0). Our attacks 
are summarized in Table [lj 

These attacks allow us to evaluate the security margin of the unbroken variants 
of our schemes. Since only the perturbation protects the x-scheme from a number 
of practical attacks, we conclude that it is rather fragile, but might become a 
good candidate for a strong white-box implementation when the complexity of 
generic algorithms applied to the perturbed version is better understood. In 
contrast, the expanding scheme appears to be more resistant to generic attacks, 
and we propose it as a ready-to-use public-key encryption scheme and a strong 
white-box implementation. 
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Table 1 . Summary of our attacks on the weakened versions of our schemes. D stands 
for the complexity of decomposition attacks. 


Weakening 

Attack complexity 

Attack type 

Reference 

Expanding scheme 


Public perturbation 

2 45 + D 

Interpolation 

Section 13.21 

Biased S-boxes (bias= 1/8) 

2 88 

LPN 

Section 13.41 

X-scheme 


Public perturbation 

2 57 + D 

Interpolation 

Section 13.21 

No perturbation 

« 2 40 

Groebner-basis 

Section 13.31 

Small S-boxes 

2 45 

Algebraic 

0 


3.1 Generic Attacks 

Given the public-key of a multivariate scheme, an attacker may directly try to 
solve the multivariate polynomial equations using a generic algorithm. If the 
public-key is a vector of m polynomials in n over ¥ q , then a plaintext can always 
be found by exhaustive search in time 0(q n ). The other main family of algo- 
rithms to solve systems of polynomial equations are Groebner-basis algorithms, 
such as Buchberger’s algorithm and all its derivatives 571128] . 

Without going into details (the interested reader is referred to a standard 
textbook such as [18]), given a system of polynomial equations /i = • • • = 
f m = 0 in xi, ... ,x n , a Groebner basis of the ideal spanned by the fi s is an 
equivalent system of equations with nice properties. If the system admits a single 
solution (ai, . . . , a n ), then a Groebner basis is precisely the vector of polynomials: 
xi — ai, . . . , x n — a n . It follows that if a Groebner basis can be computed, then 
the system of equations can be solved. 

Groebner basis algorithms work by performing polynomial elimination , i.e., 
by trying to eliminate some terms by summing suitable multiples of other poly- 
nomials. The complexities of these algorithms are difficult to analyze 0. They 
are essentially exponential in the highest degree reached by the polynomials cre- 
ated and manipulated by the algorithms during their execution. On “generic” 
systems of n equations in n variables, this degree is typically n. However, in some 
special cases it can be lower. For instance, the first HFE Challenge could be bro- 
ken because in HFE, for some ranges of parameters, this degree was roughly 
C(\ogn). 

3.2 Interpolation Attack on the ASASA Scheme with Public 
Perturbation Polynomials 

We stressed that the perturbation polynomials must be secret. A reader may 
wonder why this is required, since these polynomials are seemingly mixed by the 
last affine transformation U. 

In this subsection we outline an attack that peels off the perturbation polyno- 
mials and recovers the core ASASA scheme in almost practical time. Suppose we 
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work over a field F 2 and the scheme adds perturbation polynomials at r bit po- 
sitions after the nonlinear transformation a 2 (cf. Eq. ([2])), and the total number 
of variables in the scheme is n. Then we collect N plaintexts Xi such that 

a P (xi) = 0. 

Since polynomials of a p do not have any structure, finding a common zero is an 
NP-hard problem, and we expect that 2 r plaintexts must be tried to find a right 
one. Hence the naive complexity of this step is N2 r evaluation^ of a p . 

Then we evaluate the right plaintexts on the perturbed scheme b p . Since a. p 
is zero, we have 

b p (xi) = Woa 2 oToai oS(xi). 

Therefore, we know the evaluation of the ASASA scheme without perturba- 
tions on N plaintexts. Since the scheme has degree 4, the polynomial coefficients 
can be recovered by the Lagrange interpolation. There are C*) monomials 

of degree 4 or smaller, hence N must slightly exceed ( n ) allow for linear 

dependencies among plaintexts. For the typical value n = 2 7 we need about 2 25 
right plaintexts to fully recover the core ASASA polynomials and then launch 
the decomposition attack. However, the interpolation itself is not a trivial pro- 
cedure, since we deal with a multivariate function. Only recently an algorithm 
with complexity quadratic in the number of monomials has been proposed pQ. 
Equipped with it, we recover a single polynomial in 2 50 bit operations, and the 
entire b p in 2 57 operations. In turn, 2 25 right plaintexts can be obtained for 16 
noisy bits in 2 41 evaluations of a p , and for 24 noisy bits - in 2 49 evaluations, 
which is close to 2 55 bit operations. Therefore, the total complexity of recovering 
(x) is about 2 57 bit operations. 

This attack clearly shows that the perturbation polynomials must not be 
public and should not have any structure that would allow the adversary to find 
their common zeros. We do not see how the attack can be applied to secret 
polynomials. 


3.3 Algebraic Attack on the Plain %-Scheme 

Although x has been used successfully in the symmetric world, it turns out to be 
a complete disaster in a multivariate context. An ASA construction where S = x, 
with n — 127 variables over F 2 is broken in a few seconds by a direct Groebner 
basis computation. A two-layer ASASA construction is not more secure, and can 
be broken in less than two hours using the implementation of the F4 algorithm of 
the MAGMA computer algebra system [14] (and lOOGbytes of RAM). This hap- 
pens because a Groebner basis can be computed by manipulating polynomials 
of small, constant degree (typically 3 or 6). 

Let us give some detailed explanation for the insecurity of the ASA construc- 
tion. We work within the polynomial ring R = ¥2 [ xq , . . . , x n -i], and we consider 

4 Finding subsequent solutions might be easier, but this step is not a dominant in our 
attack complexity. 
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the ideal of R : 

X = ^/o 5 • • • 5 fn—'i.i X 0 Xo, • • • 5 X n — 1 ^n— l) 

where fi = X{ + 2 +^i+i^i+2 + (all indices are taken modulo n), and where 

the di are constants. Any solution (in the x^s) making all the polynomials in 
X vanish simultaneously, is a solution of x( x u • • • > x n) = (ao? • • • ? a n- 1)- Such a 
solution always exists, and is unique. 

We will show that there are many linear polynomials in this ideal, and that 
they can be “easily” discovered (by manipulating small-degree polynomials). 
Indeed: 

x u Hi • fi ~ x i + 2 * {xi+ 1 2 — Xi+ 1) + fi - 1 = (xi - 1 + Xi+i) — (di - 1 + di+ 1) 

The expression on the left-hand side is a polynomial combination of elements 
of X, therefore it belongs to X. As a consequence, the linear polynomial on the 
right-hand side can be found inside X after performing a few steps of polynomial 
elimination on polynomials of degree less than 3 . 

After these n linear relations have been found, another few steps of polyno- 
mial elimination allows all the variables but one to disappear. This shows that 
a Groebner basis of the ideal X can be computed in polynomial time. Now, per- 
forming a (random) linear change of coordinate in X, or replacing the generators 
of X by (random) linear combinations thereof does not change this fact. As a 
conclusion, the ASA construction, where S is the x-function, falls victim to a di- 
rect algebraic attack, by running any Groebner basis algorithm on the equations 
defining the “white-box” . 

This reasoning extends to the ASASA construction where both non-linear lay- 
ers are x (however, this time the degree is 6). It is an open question of how much 
the added perturbation slows the Groebner-basis attacks (our implementation 
does not break the selected noise parameters in reasonable time). 

3.4 Attack on the Expanding Scheme with Biased S-Boxes 

If S-box output bits are biased, an attack exploiting this bias can be applied. 
The last affine transformation can be viewed as affine over F2, so the further 
analysis without loss of generality applies to any field of characteristic two. 

We target a single biased bit 6 after the second layer of expanding S-boxes: 
the probability P [b = 1 ] of its equality to 1 is equal to p 7^ If y is a ciphertext, 
then following previous notations, the biased bit is the 6-th component of U~ x -y. 
In other terms, if u denotes the 6-th line of E/ -1 , then (u,y) = 6. 

Now, assume we collect a large number (say N ) of ciphertexts. We stack them 
vertically into a matrix C, which thus has N rows. Let us also assume that 6 is 
biased towards zero. Then we have the “noisy linear system” : 

u • C = e, 

where e is a vector of i.i.d. random variables following the Bernoulli distribution 
with mean p. Recovering u is exactly an instance of the Learning Parity with Noise 
(LPN) problem. 
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The best known algorithms to solve LPN are variants of the BKW algo- 
rithm m, whose complexity is of order O (2 n / logn ). The only actual imple- 
mentation (along with algorithmic improvements) is described in [37 , and some 
more tweaks are given in [3] . The actual complexities of these algorithms depend 
on the bias (their efficiency decreases when the bias gets closer to zero). 

With n = 512 variables, and if P[6 = 1] = 1/8, then the implementation of [37 ] 
is said to require 2 80 bits of memory (plus the time needed to sort this much 
memory 80 times). However, time-memory tradeoffs, plus algorithmic improve- 
ments, allow [3 to conclude that the same problem can be solved in 2 59 bits 
memory and less than 2 100 bits operations. If 2 80 bits of memory are available, 
then the running-time could be decreased to 2 88 bit operations. 

This beats more naive approaches, such as, for instance, enumerating all the 
possible sparse possibilities for the first n components of e, and solving the 
corresponding linear system for each trial. The above instance would require 
more than 2 120 operations to be solved using the naive approach. 

Of course, the attack has to be repeated for each row of C/ -1 , and possibly 
twice for each row (assuming that the targeted bit is biased towards zero, or 
towards one). Note that the above estimates are extremely pessimistic; in random 
expanding S-boxes of degree 2, the biases we observed experimentally are much 
lower than what was used above (we observed ¥[b = 1] ~ 0.49). 

After U is recovered, we can view the output of expanding S-boxes, and are 
likely to recover them by interpolation due to low degree. 


4 Black-Box ASASA Schemes 

Given rather low performance and large key size of the public-key ASASA 
schemes, a reader may wonder if significant performance increase can be achieved 
with lower security goals. We answer this question twofold. First, we propose a 
generic black-box symmetric cipher based on the five layerASASA. The cipher is 
expected to have a very fast software implementation thanks to vector instruc- 
tions in modern processors. Secondly, we use a small version of this cipher as a 
building block in achieving weak white-box security (Section [5j). 


4.1 Design 

We propose a symmetric cipher with a classical set of parameters, widely used 
in AES and other designs. It has a block of n = 128 bits with m = 8 bit S-boxes 
and a choice of key-sizes 128 - 256 bits. Let us outline specific parameters for 
linear and nonlinear layers. 

Affine layers. A key-dependent n x n affine transformation can be produced out 
of the master key K by any secure key derivation function Hk (for example a fast 
stream cipher, or a block cipher in the counter mode (more details in [8])), and 
checking that the resultant matrix is invertible, this can be done in 0(n 3 ) steps, 
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and we also generate an n-bit constant 0. The branch number of the matrix [2DJ 
determines the minimum number of active S-boxes in a differential trail, and 
thus the upper bound on the trail probability. Since the matrix is random, we 
expect the branch number to be close to the maximum possible (the number of 
S-boxes n/m plus one). Note that for each affine layer a new matrix is generated. 

Nonlinear layers. Typically, nonlinear layers of symmetric ciphers consist of 
several small S-boxes, which have a compact description [121120] . For the AS AS A 
scheme we propose to use 32 randomly generated 8-bit invertible S-boxes, which 
are all different and key- dependent. We note that efficiency of generic attacks on 
the SASAS structure m increases if smaller S-boxes are used, and thus it may 
be interesting in the future to explore full block size non-linear layers, for which 
such attack would not work. 

Large S-box alternatives. The choice for large block algorithmic S-boxes is sur- 
prisingly limited. Unless the S-boxes are themselves multi-layer permutations 
(e.g., fixed- key ciphers), a compact description is typically delivered in the alge- 
braic form as a function over an appropriate finite field. The resulting permuta- 
tion polynomials have become an active research topic in the recent years. The 
well known example is X 2 +1 over F 2 ^ (scheme C* [39]); the more recent and 

interesting include ^X 2 + X + + X over F 2 ^ by Zeng et al. [50] (derived 

from Helleseth- Zinoviev polynomials) (more references in [8]). It thus can be an 
interesting second challenge to break the symmetric ASASA scheme with known 
block-wide non-linear layers. Note however that fixed S-boxes do not offer im- 
plementation advantage and thus we would keep S-boxes secret and randomly 
generated in the main variant of our scheme. Implementation details 

Implementation. The implementation details can be found in the full version of 
the paper [8]. 


4.2 Security Analysis 

Differential and linear attacks. We expect the secret linear layers to hide all dif- 
ferential [6] and linear [38] properties of the cipher, since it becomes impossible to 
figure out any high-probability differential or linear trail. It can be argued, how- 
ever, that the existence of high-probability characteristics may lead to efficient 
distinguishes. For instance, Dunkelman and Keller showed in [26' that if for 
every a the differential probabilities {a — )> /3} are much higher (or much lower) 
than for a random permutation, then this can be used as a distinguisher. The 
authors further suggested the parameter of effective linearity that essentially 
measures the average probability of the boomerang difference quartet (a,K) 

5 Non-anonymous final version of this paper will link to implementations of our 
schemes and challenges gradually increasing complexity for the interested crypt- 
analysts. 
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over all possible <a, K and is supposed to take even unknown characteristics into 
account. In the full version of this paper [ 5 ] we show that all these methods do 
not lead to attacks much faster than exhaustive key search. 

Algebraic attacks. We expect the random S-box of width m to have the alge- 
braic degree m — 1 = 7. As a result, the entire scheme can be represented by 
a polynomial of degree 49 over F 2 . As observed by Meier uni, the low degree 
can be detected by applying differentials of the same order to the ciphertext. 
Therefore, an attacker can distinguish the ASASA construction from random 
given 2 49 chosen data and time. However, this does not lead to the disclosure of 
the plaintext, and it is unclear how this property can be exploited. 

Other attacks. The boomerang and impossible differential attack can be also of 
concern. We have tried basic and improved versions of these attacks, and in all 
cases the randomness of the affine layers prevented us from mounting an attack. 
However, it is possible to build boomerang quartets in the known- key setting 
by activating a single S-box at both sides of the boomerang. Whether such 
properties can be carried out to the secret-key setting is the object of the future 
research. Impossible differential attacks typically rely on truncated differentials 
with probability 1 which exist in some ciphers due to incomplete diffusion. Since 
in our case the random affine layers provide complete diffusion and since the 
entrance into and the exit from the scheme are both guarded by these affine 
layers, chosen plaintext attacks have little chance of predicting truncated values 
somewhere inside the scheme. 

Our scheme should be more secure than a two-round Even-Mansour cipher m, 
where the subkeys are simply xored to the internal state (as opposed to applying 
a full-blown secret affine transformation to the internal state). The recent attack 
on the 2-round Even-Mansour [23] explicitly requires the access to the internal 
permutation and thus can not be immediately used in our setting. 

The meet-in-the-middle attacks [33] do n °t apply to our scheme, because the 
amount of key material used to compute any matching variable is too large 
(several S-boxes and a large part of the affine transformation) . The cube attacks 
do not apply, since there is no compact polynomial representation of the scheme. 

Structural attack. Finally, we investigate the structural attack from [TO]. We will 
see that even though it does not apply to the 128-bit ASASA cipher, it allows 
to bound the security level of schemes with smaller block, which are used in 
Section [5] First, we recall the main property preserved by the SASA structure 
with m-bit S-boxes: 

Theorem 1 ( my Let {xi,^ 2 , . . . ,£ 2 m } be the inputs to the SASA structure 
such that the input bits to one S-box take all possible combinations whereas the 
other bits are constant. Then the XOR of all outputs is the all-zero bit vector: 

0 SASA(® i ) = (0,0 ) ...,0). 
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Now consider the input y to some m-bit S-box in the first S layer of the ASASA 
scheme E. It is an affine function of the scheme input y : 


y = M • x ® c, 


where M is an (m x n)-matrix and c is a constant. Let L be an (n x n)-matrix 
such that 

M x L = (AT 0 0 • • • 0,) (3) 

where M' is a (m x m)- submatrix. If we apply E to L • x, then y depends 
only on the first m bits of x. Thus we compose inputs {x\,X 2 , . . . ,£ 2 ™} as in 
Theorem^ multiply them by L and apply E. The output bits must sum to 0 bit- 
wise. This property allows to recover the outer affine layer and eventually all the 
components of E. Equation ([3j) holds with probability which makes 

the attack impractical for large n. However, for small n it might be efficient. 
Therefore, the maximum security level of the ASASA scheme with n variables 
and m-bit S-boxes does not exceed (n — m)m bits. 

For our design, this gives the upper bound of 960 bits, which is far larger 
than the key length which is used to generate the affine and nonlinear layers. 
As a result, we claim the 128-bit security level for our design, even though a 
small factor over exhaustive key search might be saved by biclique attacks or by 
exploiting the full codebook. This is still higher security than what is offered by 
2048 bit RSA. 

5 Proposal for Weak White-Box Security: ASASA-Based 
Block Cipher 

5.1 Weak White-Box Security 

Definition 2. Let the pair of algorithms (E,D) be a private-key encryption 
scheme, which takes key K as a parameter. We call $(K) the equivalent key 
set for key K , if from any element from $(K) it is easy to get an algorithm 
equivalent to Ek, i-e. there is an (efficient) algorithm 

A{K) £', 


where £’ equivalent to Ek- 

Definition 3. The function Oe k is a T -secure weak white-box implementation 
for Ek if it is computationally hard to obtain JC £ $(K) of length less than T 
given full access to Oe k - 

In other words, an adversary who gets a secure weak white-box implementa- 
tion is unable to find out any compact (shorter than T) equivalent representation 
of it. In the practical sense, an adversary who wants to share a protected imple- 
mentation of the encryption routine, would have to share the entire code. Such 
ciphers are motivated by DRM applications, which aim to prohibit the users of 
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protected content from sharing the information needed to decrypt it. Clearly, 
in this context there is little practical difference between sharing the key and 
sharing, say, the set of subkeys as long as the other cipher operations are inde- 
pendent of the key. Therefore, naive methods of key protection, e.g. transforming 
it with a preimage-resistant hash function, would not prevent an attack. Ideally, 
the adversary would have to isolate and extract the entire decryption routine, 
which might be hard per se. 

5.2 Weak White-Box Cipher Proposal 

In this section we propose a blockcipher family, which conforms to the weak 
white-box security notion, so that it is computationally infeasible to derive a key 
or any other compact secret information from the white-box implementation. 

We further say that the white-box implementation is memory-hard if it re- 
quires a pre-specified and large enough amount of memory in the spirit of 
memory-hard key-derivation functions [44] . This concept is even stronger than 
the T-secure weak WB implementations, as an adversary is unable to reduce 
the implementation size at all and thus would have to publish the entire set of 
lookup tables. In contrast to earlier white-box designs, we offer a set of ciphers 
with a wide range of memory requirements. 

Our memory-hard cipher con- 
sists of a number of smaller 
components, which are exposed 
as lookup tables in the white- 
box implementation. Each com- 
ponent is either a small-block 
ASASA cipher, adapted from 
the construction in Section [4j or 
just a single S-box. The S-boxes 
are minimum 8-bit wide to 
avoid equivalence problems [9], 
but 10-, and 12-bit ones are also 
used. All S-boxes and affine lay- 
ers are derived in a determinis- 
tic way from the secret key. In 
fact, it is enough to have linear, not affine, layers, since the constant can be 
kept in the S-boxes. To estimate memory requirements, we assume for simplicity 
that each table output fits an integer number of bytes (e.g., 2 bytes for 10-bit 
S-boxes). 

We propose the SPN structure for the cipher, i.e. we alternate layers of smaller 
ciphers (denote their number by R ) with a public linear transformation C. Any 
transformation with good diffusion shall be fine. Recalling that AES can be par- 
titioned into 5 rounds with 4 32-bit Super S-boxes in each, we propose R layers 
for similar security margin. The cipher’s pseudocode is as follows (Figure [3]): 
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Fig. 3. Blockcipher family for weak white-box se- 
curity 
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— Repeat R times; 

• Apply R parallel ASASA-based distinct blockciphers; 

• Apply the linear transformation C to the entire state. 

We outline specific parameters and memory requirements for 64- and 128-bit 
blockciphers in Table [2j The 16-bit S layer has two 8-bit S-boxes, the 18-bit - 
8-bit and 10-bit S-boxes, the 20-bit - two 10-bit S-boxes, and the 24-bit - three 
8-bit S-boxes. We see that whereas the black-box implementation is a few dozen 
KBytes, the white-box implementation can be made large enough in the range 
from 2 MBytes to several GBytes. 


Table 2. Parameters and memory requirements of white-box and black-box implemen- 
tations for the 128-bit blockcipher. We assume that n-bit component occupies [f]2 n 
bytes of memory in the white-box implementation. 


Rows 

Component 

Components in row 

Security level 
(bits) 

White-box 

memory 

Black-box 

memory 

64-bit block 

4 

ASASA 

4x (16-bit) 

64 

2MB 

16 KB 

4 

ASASA 

3x (18-bit) + 10-bit 

64 

9 MB 

32 KB 

128-bit block 

8 

ASASASA 

8 x (16-bit) 

128 

8 MB 

96 KB 

8 

ASASA 

24-bit + 6x (16-bit) +8-bit 

64 

384 MB 

64 KB 

5 

ASASASA 

4x (28-bit) + (16-bit) 

128 

20 GB 

130 KB 


Table 3. Our schemes in comparison, along with (presumably) secure parameters for 
UOV and HFE 


Scheme 

Field 

# vars 

# polys 

Degree 

Private key 

PK/white-box 

Ref. 

Black-box ASASA 

f 2 

128 

- 

- 

14 KB 

196 KB 

Sec. [4 

X-scheme* 

f 2 

127 

127 

4 

8 KB 

300 MB 

Sec. [2 

Expanding scheme 

F 16 

32 

128 

4 

16 KB 

24 MB 

Sec. [2 

Memory-hard cipher 

f 2 

64 

- 

- 

16-32 KB 

2-8 MB 

Sec. [5 

Memory-hard cipher 

f 2 

128 

- 

- 

64-130 KB 

8 MB - 20 GB 

Sec. [5 

HFE 

F 16 

64 

64 

2 

48 KB 

520 KB 

m 

UOV 

F 2 56 

78 

26 

2 

71 KB 

80 KB 

m 


* — several variants broken in this paper. 


Security Analysis. Our AS AS A components have very small block and only a 
few S-boxes in the S layer. Some attacks that are infeasible on the 128-bit block, 
may have practical complexity on the 16-bit block. The best attack we could find 
was presented in Section I4~2l and has complexity for m-bit S-boxes and 
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the n-bit block. As a result, the 16-bit AS AS A components with 8-bit S-boxes 
have maximum security level of 64 bits, the 20-bit components — 100 bits, and 
24-bit components with 8-bit S-boxes — 128 bits. 

An easy way to increase the security level is to add two more layers, thus 
producing ASASASA components. This yields a 50% increase of the private key 
size, but no increase in the white-box implementation size. Since we have not 
found a way to expand our attack to this structure, we conjecture its security 
level to 128 bits. In Table [2] we provide both variants so that a protocol designer 
may choose between them according to his own requirements. 

6 Conclusion 

We have explored deeply the state of the art in black-box, white-box, and mul- 
tivariate public key cryptography, and concluded that the ASASA structure is 
the minimal generic construction which is still unbroken. We constructed cipher 
candidates for all these settings. We designed two ASASA schemes for public- key 
cryptography based on multivariate polynomials. We showed how to avoid exist- 
ing attacks on multivariate schemes, including the recent powerful decomposi- 
tion algorithms by adding appropriate perturbation functions. In the traditional 
black-box setting we offered a cryptanalytic challenge of a fast cipher with small 
random S-boxes and random affine layers. 

We proposed several solutions for white-box cryptography, both in weak and 
strong security notions. In the weak model, we designed a memory-hard cipher, 
which prohibits key extraction and requires an adversary to spend a large, pre- 
defined amount of memory. It is based on small ASASA components. We showed 
how our multivariate schemes can be used as strong white-box implementations, 
as they are not invertible without the key and allow fast encryption and decryp- 
tion for legitimate users. We compare the implementation size of our schemes 
with other unbroken MQ-systems in Table [3j 

Our findings indicate a number of future research directions. First, it would be 
interesting to explore algorithmic large S-boxes in the black-box ASASA struc- 
ture, e.g. instantiated with recently found permutation polynomials. Secondly, a 
theory of perturbation layers as a countermeasure to generic decomposition algo- 
rithms needs to be developed, possibly along the concept of LWE (Learning with 
Error). Thirdly, we suggest investigating the actual security level of small-block 
(16-,20-, 24-bit) ASASA schemes to figure out which components are suitable 
for weak white-box implementations. Finally, open question is to develop con- 
structions with smaller descriptions (e.g., within 1 MByte), which are bijective, 
suitable for digital signatures, and allow strong white-box implementations. 
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Abstract. The Sponge function is known to achieve 2 C//2 security, where 
c is its capacity. This bound was carried over to keyed variants of the 
function, such as Sponge Wrap, to achieve a min{2 C//2 , 2 K } security bound, 
with n the key length. Similarly, many CAESAR competition submis- 
sions are designed to comply with the classical 2 C//2 security bound. We 
show that Sponge-based constructions for authenticated encryption can 
achieve the significantly higher bound of min{2 6 / 2 , 2 C , 2 K } asymptoti- 
cally, with b > c the permutation size, by proving that the CAESAR 
submission NORX achieves this bound. Furthermore, we show how to 
apply the proof to five other Sponge-based CAESAR submissions: As- 
con, CBEAM/STRIBOB, ICEPOLE, Keyak, and two out of the three 
PRIMATEs. A direct application of the result shows that the parameter 
choices of these submissions are overly conservative. Simple tweaks ren- 
der the schemes considerably more efficient without sacrificing security. 

For instance, NORX64 can increase its rate and decrease its capacity 
by 128 bits and Ascon-128 can encrypt three times as fast, both with- 
out affecting the security level of their underlying modes in the ideal 
permutation model. 

Keywords: Authenticated encryption, CAESAR, Ascon, CBEAM, ICE- 
POLE, Keyak, NORX, PRIMATEs, STRIBOB. 

1 Introduction 

Authenticated encryption schemes, cryptographic functions that aim to provide 
both privacy and integrity of data, have gained renewed attention in light of the 
recently commenced CAESAR competition [Tj . A common approach to building 
such schemes is to design a mode of operation for a block cipher, as in CCM [2], 
OCB1-3 [3 l |4 }| 5]. and EAX [6] . Nevertheless a significant fraction of the CAESAR 
competition submissions use modes of operation for permutations. 

Most of the permutation-based modes follow the basic design of the Sponge 
construction [7]: their output is computed from a state value, which in turn is 
repeatedly updated using key, nonce, associated data, and plaintext by calling a 
permutation. The state is divided into a rate part of r bits, through which the 
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user enters plaintext, and a capacity part of c bits, which is out of the user’s 
control. 

The security of the Sponge construction as a hash function follows from the 
fact that the user can only affect the rate, hence an adversary only succeeds with 
significant probability if it makes on the order of 2 C / 1 2 permutation queries, as 
this many are needed to produce a collision in the capacity [7 r Keyed versions 
of the Sponge construction, such as KeyedSponge [8] and Sponge Wrap [9], are 
proven up to a similar bound of 2 c-a , assuming a limit of 2 a on online complexity, 
but are additionally restricted by the key size k to 2 K . The permutation-based 
CAESAR candidates are no exception and recommend parameters based on 
either the 2 C / 2 bound or 2 c-a bound, as shown in Table [lj 

1.1 Our Results 

Contrary to intuition, a wide range of permutation-based authenticated en- 
cryption schemes achieve a significantly higher mode security level: we prove 
that the bound is limited by approximately min{2^ r+c ^/ 2 , 2 C , 2^} as opposed to 
min{2 c / 2 , 2^}. The main proof in this work concerns NORX mode m, but we 
demonstrate its applicability to the CAESAR submissions Ascon HU, CBEAM0 
P2JI34], ICEPOLE [15], Keyak [16], two out of three PRIMATEs [17], and STRI- 
BOEEl [T91I20] . Additionally, we note that it directly applies to Sponge Wrap [9] 
and Duplex Wrap m, upon which Keyak is built. 

Our results imply that all of these CAESAR candidates have been overly con- 
servative in choosing their parameters, since a smaller capacity would have lead 
to the same bound. For instance, Ascon-128 could take (c, r) = (128, 192) instead 
of (256,64), NORX64 (the proposed mode with 256-bit security) could increase 
its rate by 128 bits, and GIBBON-120 and HANUMAN-120 could increase their 
rate by a factor of 4, all without affecting their mode security levels. 

These observations only concern the mode security, where characteristics of 
the underlying permutation are set aside. Specifically, the concrete security of 
the underlying permutations plays a fundamental role in the choice of param- 
eters. For instance, the authors of Ascon, NORX, and PRIMATEs mmm 
acknowledge that non-random properties of some of the underlying primitives 
exist. Although these properties are harmless, a non- hermetic design approach 
for the primitives affects the parameter choices. 

1.2 Outline 

We present our security model in Section[2] A security proof for NORX is derived 
in Section [3] In Section [4] we show that the proof of NORX generalizes to other 
CAESAR submissions, as well as to Sponge Wrap and Duplex Wrap. The work is 
concluded in Section [5] where we also discuss possible generalizations to Artemia 
[2T1 and 7r-Cipher [22] . 

1 CBEAM was withdrawn after an attack by Minaud m, but we focus on modes of 

operation. 

2 Both CBEAM and STRIBOB use the BLNK Sponge mode [18]. 


Beyond 2 C//2 Security in Sponge-Based Authenticated Encryption Modes 


87 


Table 1 . Parameters and the achieved mode security levels of seven CAESAR submis- 
sions. We remark that ICEPOLE consists of three configurations (two with security 
level 128 and one with security level 256) and Keyak of four configurations (one with 
an 800-bit state and three with a 1600-bit state). 



b 

c 

r 

K 

T 

security 

Ascon |TT[ 

320 

192 

128 

96 

96 

96 

320 

256 

64 

128 

128 

128 

CBEAM [?] 

256 

190 

66 

128 

64 

128 

ICEPOLE Q3] 

1280 

254 

1026 

128 

128 

128 

1280 

318 

962 

256 

128 

256 

Keyak [16] 

800 

252 

548 

128. .224 

128 

128. .224 

1600 

252 

1348 

128. .224 

128 

128. .224 

norx ma 

512 

192 

320 

128 

128 

128 

1024 

384 

640 

256 

256 

256 

GIBBON/ 
HANUMAN ^ 

200 

159 

41 

80 

80 

80 

280 

239 

41 

120 

120 

120 

STRIBOB [I9] 

512 

254 

258 

192 

128 

192 


2 Security Model 

For n E N, let Perm(n) denote the set of all permutations on n bits. When 
writing x X for some finite set A, we mean that x gets sampled uniformly at 
random from A. For x E {0, l} n , and a, b < n, we denote by [x] a and [x\b the a 
leftmost and b rightmost bits of x, respectively. For tuples (j, &), (j',k f ) we use 
lexicographical order: (j, k ) > (/, k') means that j > /, or j = j' and k > k ' . 

Let 77 be an authenticated encryption scheme, which is specified by an en- 
cryption function £ and a decryption function V: 

(C, A) <— £ k (N ; 77, AT, T) and M/_L <— V K (N ; H, C, T; A) . 

Here N denotes a nonce value, H a header, M a message, C a ciphertext, T 
a trailer, and A an authentication tag. The values (H,T) will be referred to 
as associated data. If verification is correct, then the decryption function T*k 
outputs M, and A otherwise. The scheme 77 is also determined by a set of 
parameters such as the key size, state size, and block size, but these are left 
implicit. In addition, we define $ to be an ideal version of £k, where $ returns 
(C, A) {0, l}l M l+ r on input of a new (TV; 77, M, T). 

We follow the convention in analyzing modes of operation for permutations 
by modeling the underlying permutations as being drawn uniformly at random 
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from Perm (6), where b is a parameter determined by the scheme. We note that 
irregularities in the underlying permutation may invalidate the underlying as- 
sumption. 

An adversary A is a probabilistic algorithm that has access to one or more or- 
acles 0, denoted A° . By A° = 1 we denote the event that A, after interacting 
with 0 , outputs 1. We consider adversaries A that have unbounded compu- 
tational power and whose complexities are solely measured by the number of 
queries made to their oracles. These adversaries have query access to the under- 
lying idealized permutations, £k o r its counterpart $, and possibly Vk- The key 
77 is randomly drawn from {0, 1}* at the beginning of the security experiment. 
The security definitions below follow [231124] . 

Privacy 

Let p denote the list of underlying idealized permutations of 77. We define the 
advantage of an adversary A in breaking the privacy of 77 as follows: 


Adv p ™(A) = |Pr p ,K ( A p± ’ £k = l) - Pr p $ (U p± ’ $ = l) 


where the probabilities are taken over the random choices of p, $,77, and the 
random choices of A, if any. The fact that the adversary has access to both 
the forward and inverse permutations in p is denoted by p 1 ^. We assume that 
adversary A is nonce-respecting, which means that it never makes two queries 
to £k or $ with the same nonce. By Adv^ lv (g p , qg,Xg) we denote the maximum 
advantage taken over all adversaries that query p ± at most q p times, and that 
make at most qg queries of total length at most A g blocks to £k or $. We remark 
that this privacy notion is also known as the CPA security of an (authenticated) 
encryption scheme. 

Integrity 

As above, let p denote the list of underlying idealized permutations of 77. We 
define the advantage of an adversary A in breaking the integrity of 77 as follows: 



where the probability is taken over the random choices of p, 77, and the random 
choices of A , if any. Here, we say that U A forges” if T>k ever returns a valid 
message (other than _L) on input of (TV; 77, 0, T; A) where (0, A) has never been 
output by £k on input of a query (TV; 77, M, T) for some M. We assume that 
adversary A is nonce-respecting, which means that it never makes two queries to 
£k with the same nonce. Nevertheless, A is allowed to repeat nonces in decryp- 
tion queries. By Adv^ th (g p , qg, Xg^qv, Ad) we denote the maximum advantage 
taken over all adversaries that query p ± at most q p times, that make at most qg 
queries of total length at most A g blocks to £k , and at most qx> queries of total 
length at most Ad blocks to T>k/~ L. 
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3 NORX 

We introduce NORX at a level required for the understanding of the security 
proof, and refer to Aumasson et al. m for the formal specification. Let p be a 
permutation on b bits. All 6-bit state values are split into a rate part of r bits and 
a capacity part of c bits. We denote the key size of NORX by hi bits, the nonce size 
by v bits, and the tag size by r bits. The header, message, and trailer can be of 
arbitrary length, and are padded using 10* 1-padding to a length of a multiple of 
r bits. Throughout, we denote the r-bit header blocks by Hi } . . . ,H Ul message 
blocks by Mi, . . . , My, ciphertext blocks by Ci, . . . , C v , and trailer blocks by 
Ti, . . . , T w . 

Unlike other permutation-based schemes, NORX allows for parallelism in the 
encryption part, which is described using a parameter D G {0, . . . , 255} cor- 
responding to the number of parallel chains. Specifically, if D G {1, - - - , 255} 
NORX has D parallel chains, and if D = 0 it has v parallel chains, where v is 
the block length of M or C. 

NORX consists of five proposed parameter configurations: NOKXW-R-D for 
(W, R, D) G {(64, 4, 1), (32, 4, 1), (64, 6 , 1), (32, 6 , 1), (64, 4, 4)}. The parameter R 
denotes the number of rounds of the underlying permutation p, and W denotes 
the word size which we use to set r = 10 W and c = 6W. The default key and 
tag size are hi = v = 4 W. The corresponding parameters for the two different 
choices of W, 64 and 32, are given in Table [lj 

Although NORX starts with an initialization function in it which requires the 
parameters (D, R, r) as input, as soon as our security experiment starts, we 
consider ( D , R, r) fixed and constant. Hence we can view in it as a function that 
maps (K,N) to (K\\N\\0 b ~ K ~ u ) ® const, where const is irrelevant to the mode 
security analysis of NORX, and will be ignored in the remaining analysis. 

After in it is called, the header H is compressed into the rate, then the state 
is branched into D states (if necessary), the message blocks are encrypted in a 
streaming way, the D states are merged into one state (if necessary), the trailer 
is compressed, and finally the tag A is computed. All rounds are preceded with a 
domain separation constant XORed into the capacity: 01 for header compression, 
02 for message encryption, 04 for trailer compression, and 08 for tag generation. 
If D 7 ^ 1, domain separators 10 and 20 are used for branching and merging, 
along with pairwise distinct lane indices idk for k = 1, . . . , D (if D = 1 we write 
idi = 0 ). 

The privacy of NORX is proven in Section l3Tl and the integrity in Section lT 2 l 
In both proofs we consider an adversary that makes q p permutation queries and 
qs encryption queries of total length Xg. In the proof of integrity, the adver- 
sary can additionally make qx> decryption queries of total length \x> . To aid the 
analysis, we compute the number of permutation calls made via the qg encryp- 
tion queries. The exact same computation holds for decryption queries with the 
parameters defined analogously. 
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Fig. 1 . NORX with D — 2 


Consider a query to £k, consisting of u header blocks, v message blocks, and 
w trailer blocks. We denote its corresponding state values by 




r 

6 i,o> • • 

-I 

• 5 3 * S 1, V1 


\ 

init. H 

S , S o , • • 

S H - 
• 1 1 



5 S 0 5 • • 

;S T W , s ^ 




• 5 *D,V D J 


J 


(i) 


as outlined in Figure \l\ Here, v k = v. If there are no branching and 

merging phases, i.e. D — 1, then the state values corresponding to the branching 
and merging, {sfg, . . . , 0 } and sj, are left out of the tuple. Note that the 

length of this tuple equals the number of primitive calls made in this encryption 
query, as every state value corresponds to the input of exactly one primitive call. 
A simple calculation shows that if the j th £k query is of length u + v + w blocks, 
it results in u + v + w + 3 state values if D = 1, in u + v + w + D + 4 state values 
if D > 1, and in u + 2v + w + 4 state values if D = 00 We denote the number 
of state values by ctsj, where the dependence on D is suppressed as D does not 
change during the security game. In other words, agj denotes the number of 
primitive calls in the j th query to £k- Furthermore, we define ag to be the total 
number of primitive evaluations via the encryption queries, and find that 

qs r 2\s+4.qg, if D = 0 , 

*£ < lx £ + 3q £ , if -D = 1 , (2) 

i =1 (\ £ + (D + 4)q £ , if Z) > 1 . 

This bound is rather tight. Particularly, for D = 0 an adversary can meet this 
bound by only making queries without header and trailer. For queries to Vk we 
define ax>j and ax> analogously. 

3 For D — 0, the original specification dictates an additional 10 6-2 1-padding for every 

complete message block. This means that lanes 1, . . . , v — 1 consist of two rounds. We 

do not take this padding into account, noting that it is unnecessary for the security 
analysis. 
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3.1 Privacy of NORX 

Theorem 1. Let 77 = (£,D) be NORX based on an ideal underlying primitive 
p. Then, 


Adv^ lv (q p ,qs, As) < 


3 (<? p + ct £) 2 
2 b+l + 


( 8 eqpas \ 1/2 '[Qp Qp + o ' £ 

V 2 b ) + 2 C 2 K 


where erg is defined in w- 

Theorem [T] can be interpreted as implying that NORX provides privacy security 
as long as the total complexity q p + erg does not exceed min{2 6 / 2 , 2 K } and the 
total number of primitive queries q p , also known as the offline complexity, does 
not exceed 2 c /r. See Table |T] for the security level of the various parameter 
choices of NORX. 

The proof is based on the observation that NORX is indistinguishable from a 
random scheme as long as there are no collisions among the (direct and indirect) 
evaluations of p. Due to uniqueness of the nonce, state values from evaluations 
of £k collide with probability approximately 1/2 6 . Regarding collisions between 
direct calls to p and calls via £k : while these may happen with probability 
about 1/2 C , they turn out not to significantly influence the bound. The latter is 
demonstrated in part using the principle of multiplicities [25] : roughly stated, the 
maximum number of state values with the same rate part. The formal security 
proof is more detailed. Furthermore, we remark that, at the cost of readability 
and simplicity of the proof, the bound could be improved by a constant factor. 

Proof. We consider any adversary A that has access to either (jA , £k) or (p^, $) 
and whose goal is to distinguish these two worlds. For brevity, we write 

Adv^ lv (,4) = A A (p ± ,£ K ;p ± ,$) ■ (3) 

We start with replacing p± by a random function, as this simplifies the analysis. 
This is done with a PRP-PRF switch 26 . 2J . in which we make a transition 
from p± to a primitive f± defined as follows. This primitive f± maintains an 
initially empty list T of query /response tuples ( x,y ). For T, we denote its set 
of domain and range values by dom(J r ) and rng(J r ), respectively. For a forward 
query f[x) with x G dom(J r ), the corresponding value y = P(x) is returned. For 
a new forward query f(x ), the response y is randomly drawn from {0, l} 6 , then 
if y is in rng(J r ) the primitive aborts, otherwise the tuple ( x , y) is added to T . 
The description for / -1 is similar. The usage of T will remain implicit in the 
remaining usage of f ± . Now, p ± and J 1 * 1 behave identically as long as the latter 
does not abort. Given that the adversary triggers at most q p + erg evaluations of 
/, such an abort happens with probability at most ( qp ~^ a£ ) /2 6 < (q p -\-crg) 2 /2 6+1 . 
This PRP-PRF switch needs to be applied to both the real and ideal world, to 
get 

^a{p ± Xk',P ± ,%) < A^(/ ± ,^x;/ ± ,$) + — NL 


(4) 
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We restrict our attention to A with oracle access to (/ ± , F), where F G { £k , $}• 
Without loss of generality, we can assume that the adversary only queries full 
blocks and that no padding rules are involved. We can do this because the 
padding rules are injective, allowing the proof to carry over to the case of frac- 
tional blocks with 10* 1-padding. 

We introduce some terminology. Queries to f ± are denoted (. Xi,yi ) for i = 
1, . . . , q p , while queries to F are written as elements (Nj ; Hj , Mj , Tj ; Cj , Aj ) for 
j = 1 , . . . , qg . If F = £k , the state values are denoted as in 0, subscripted with 
a j: 





r 

s j, i,o^ • • 

q M -i 


\ 

Unit. 

H 

H . 



• o T 

T . tag 

s i ’ 

S j, 0’ • • 

• 7 b j,ui 



5 s j,Cb * • 

• 5 5 Sj 




L s j,D,0i * * 

q m 

• 5 b j,D,v D J 


J 


If the structure of © is irrelevant we refer to the tuple as (s^i, . . . , Sj^ a£ d ), 
where we use the convention to list the elements of the matrix column-wise. 
In this case, we write parent (sj jk ) to denote the state value that lead to Sj jk , 
with parent ( 5 J? i) := 0 and parent (sj 0 ) := (s^ J [ Vl , . . . , sj^ D Vd ). We remark that 
the characteristic structure of NORX, with the D parallel states, only becomes 
relevant in the two technical lemmas that will be used at the end of the proof. 
We point out that syi corresponds to the initial state value of the evaluation, 
which requires special attention throughout the remainder of the proof. 

We define two collision events, guess and hit. Let i G j, j' G 

{1,...,^}, k G {1, . . . , cr£j}, and k' G {1, . . . , 

guess(z; j, k) = Xi = s j:k , 

hit(j, &;/,£/) = parent / parent (s/^/) A s jrk = sy# . 

Event guess (i;j,k) corresponds to a primitive call in an encryption query hit- 
ting a direct primitive query, or vice versa, while h it(j, k\ j' , k') corresponds 
to non-trivial primitive calls colliding in encryption queries. We write guess = 
Vi-,j, k guess(i; j, fc), hit = Vj, k -j', k ' hit(j, k; j' , fc'), and set event = guess V hit. 

The remainder of the proof is divided as follows. In Lemma [T] we prove that 
(/ ± ,^k) and (/ ± ,$) are indistinguishable as long as -ievent holds. In other 
words, 

£ k ; / ± , $) < Pr ^ A f±,£K sets event ) . (6) 

Then, in Lemma [2] we bound this term by ^ v<Je + 

2° \ 2 b J 2 C 

^ p • Noting that ^ p<7s , this completes the proof via 

equations mm- □ 


Lemma 1. Given that event does not occur, (/ ± ,fx) and are indistin- 

guishable. 
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Proof. The outputs of f ± are sampled uniformly at random in both {/ ± ^£k) 
and (/ ± , $), except when such an output collides with a state of an Sk evaluation 
in the real world. However, this event is excluded by assuming ^guess, hence it 
suffices to only consider queries to the big oracle F £ {Sk, $}• 

Let Nj be a new nonce used in the F-query (Nj] Hj, Mj,Tj ), with correspond- 
ing ciphertext and authentication tag (Cj,Aj). Denote the query’s state values 
as in ©• Let u, v, and w denote the number of padded header blocks, padded 
message blocks, and padded trailer blocks, respectively. 

By the definition of $, in the ideal world we have (Cj, A f) A {0, 1}I m j'I+ t . We 
will prove that ( Cj,Aj ) is identically distributed in the real world, under the 
assumption that guess V hit does not occur. Denote the message blocks of Mj by 
for k = 1, . . . , D and t = 1, , 

We know that s^ u is new and that f(s^ u ) does not collide with any other /- 
query because otherwise ^event would have been violated. Since s^ k 0 = /(sy u )® 
idk we conclude that 0 is new for k = 1, . . . ,F>, as otherwise event would 
be set. Similarly, s^ ki is new for all i > 0. The ciphertext blocks Cj^e are 
computed as 

As the state value s^ k ^_ 1 has not been evaluated by / before (neither directly 
nor indirectly via an encryption query), f(s^ k outputs a uniformly random 

value from {0, l} 6 , hence Cj^,i {0, l} r . We remark that similar reasoning 
shows that a ciphertext block corresponding to a truncated message block is 
uniformly randomly drawn as well, yet from a smaller set. The fact that Aj £- 
{0, 1} T follows the same reasoning, using that s^ as is a new input to /. Thus, 

Aj = [f (sf s )Y £{0,1} T . □ 


Lemma 2. Pr (>* & sets event) < *2£±fj H + ^ + 

Q P + erg 
2 K 


Proof. Consider the adversary interacting with (/ ± , £k), and let Pr (guess V hit) 
denote the probability we aim to bound. For i £ {1 , . . . , g p }, define 

key(i) = [xi] K = K , 

and key = V* key(z). Event key(z) corresponds to a primitive query hitting the 
key. Let j £ {1, . . . , qg} and k £ {1, . . . , cgj}, and consider any threshold p > 1, 
then define 


multi(j, k) = 


max a£{0 ,i}^ \{j' <j, 1 < k' < k : a G {[sj , ,fc'] r \ [f( s j',k')] r }} 


>p. 
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Event multi (j,k) is used to bound the number of states that collide in the rate 
part. Note that state values Sj>p are not considered here as they will be covered 
by key. We define multi = multi (qg,(j£, q£ ), which is a monotone event. By basic 
probability theory, 

Pr (guess V hit) < Pr (guess V hit | -i(key V multi)) + Pr (key V multi) . (7) 

In the remainder of the proof, we bound these probabilities as follows (a formal 
explanation of the proof technique is given in Appendix [A]) : we consider the ith 
forward or inverse primitive query (for i G {1, . . . , q p }) or the fcth state of the j th 
construction query (for j G {1 and k G {1, . . . , crgj}), and bound the 

probability that this evaluation makes guess V hit satisfied, under the assumption 
that this query does not set key V multi and also that guess V hit V key V multi 
has not been set before. For the analysis of Pr (key V multi) a similar technique 
is employed. 

Event guess. This event can be set in the ith primitive query (for i = 1, . . . , q p ) 
or in any state evaluation of the j th construction query (for j = 1 
Denote the state values of the j th construction query as in ©. Consider any 
evaluation, assume this query does not set key V multi and assume that guessVhitV 
keyVmulti has not been set before. Firstly, note that Xi = s ^ llt for some i, j would 
imply key(z) and hence invalidate our assumption. Therefore, we can exclude 
s; nit from further analysis on guess. For i = 1, . . . , q p , let ji G {1, . . . , qg} be the 
number of encryption queries made before the ith primitive query. Similarly, for 
j = 1 denote by ij G {1, . . . , q p } the number of primitive queries made 

before the j th encryption query. 

— Consider a primitive query ( Xi,yi ) for i G {1, . . . which may be a for- 

ward or an inverse query, and assume it has not been queried to f ± before. 
If it is a forward query by -nmulti there are at most p state values 8 with 
[xi] r = [<s] r , and thus Xi = s with probability at most p/ 2 C . Here, we remark 
that the capacity part of s is unknown to the adversary and it guesses it with 
probability at most 1/2 C . A slightly more complicated reasoning applies for 
inverse queries. Denote the query by yi. By — «m u Iti there are at most p state 
values s with [yi] r = [f(s)] r , hence yi = f(s) with probability at most p/ 2 C . 
If yi equals f(s) for any of these states, then xi = 8, otherwise Xi = 8 with 
probability at most Yjj=i Therefore the probability that guess is set 

via a direct query is at most ^ + Yli=i 

— Next, consider the probability that the j th construction query sets guess, for 

j G {1, . . . , qs}. For simplicity, first consider D = 1, hence the message is pro- 
cessed in one lane and we can use state labeling (s^i, . . . , Sj r(7S j ). We range 
from Sj : 2 to Sj :C TS j (recall that Sjp = 8^ mt can be excluded) and consider the 
probability that this state sets guess assuming it has not been set before. Let 
k G {2, . . . , crgj}. The state value Sj^ equals ® v, where v is some 

value determined by the adversarial input prior to the evaluation of f(sj : k-i), 
including input from (Hj , Mj , Tj ) and constants serving as domain separa- 
tors. By assumption, guess V hit has not been set before, and is thus 
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randomly drawn from {0, l} b . It hits any X{ (i G {1, . . . , ij}) with probability 
at most ij/ 2 b . Next, consider the general case D > 1. We return to the la- 
beling of dSJ). A complication occurs for the branching states s ^ 10 , . . . , s^ D 0 
and the merging state sj 0 . Starting with the branching states, these are 
computed from s^ u as 



/(■ 




( Vl \ 

\ v d) 


where v \ , . . . , vd are some distinct values determined by the adversarial input 
prior to the evaluation of the j th construction query. These are distinct by 
the XOR of the lane numbers idi, . . . , idp- Any of these nodes equals Xi 
for i G {1, . . . , q p } with probability at most ijD / 2 b . Finally, for the merging 
node sj 0 we can apply the same analysis, noting that it is derived from a 
sum of D new /-evaluations. Concluding, the j th construction query sets 
guess with probability at most ijagj/2 b (we always have in total at most 
(jgj new state values). Summing over all qg construction queries, we get 

ih^sj/2 b . 

Concluding, 


Pr (guess | -i(key V multi)) < 


QpP 


Qp 


XX 

i=l j=l 


<J £J 

2 b 


Qs 

X 

3 = 1 


' l 3 a £,j 

2 b 


QpP . Qp&s 
~2P + 2 5 


Here we use that J2i=i YSj=i a £,j + i ^k= l h = Qp a £i which follows from a 
simple counting argument. 


Event hit. We again employ ideas of guess, and particularly that as long as 
guess V hit is not set, we can consider all new state values (except for the initial 
states) to be randomly drawn from a set of size 2 b . Particularly, we can refrain 
from explicitly discussing the branching and merging nodes (the detailed analysis 
of guess applies) and label the states as (sj : i, . . . , Sj :(7S j ). Clearly, Sj t i ^ syq for 
all j, j ' by uniqueness of the nonce. Any state value Sj^ for k > 1 (at most crg—qg 
in total) hits an initial state value s^/q only if = K, which happens with 

probability at most agj 2*, assuming Sj^ is generated randomly. Finally, any two 
other states Sj^,Sj',k' for k,k' > 1 collide with probability at most (^ £ ~ q£ ^ /2 b . 
Concluding, Pr(hit | -i(key V multi)) < ( a£ )/2 b + ag/2 K . 

Event key. For i G {1 , . . . , g p }, the query sets key(i) if [xi\ K = K, which happens 
with probability 1/2* (assuming it did not happen in queries 1, ... , i — 1). The 
adversary makes q p attempts, and hence Pr (key) < q p / 2*. 

Event multi. We again use the principles from the analysis for guess of con- 
struction queries (note that this part does not rely on multi itself). Particularly, 
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consider a new state value then for a fixed state value x G {0, l} 6 it 

satisfies = x or Sj : k = /(sj^-i) © v = x for some predetermined v 

with probability at most 2/2 6 . Now, let a G {0, l} r . More than p state values 

hit a with probability at most (^ f ) (2/2 r ) p < ^ , using Stirling’s approxi- 

mation (x\ > ( x/e) x for any x). Considering any possible choice of a, we obtain 

Pr (multi) < 2 r ■ 


Addition of the four bounds via © gives 


Pr (guess V hit) < 


gpcrg + q|/2 QpP %±as y 
2 b + 2 C + 2 K + 



Putting p = max 



gives 


Pr (guess V hit) < 


%££ + g|/2 f 2eqpOs \ 1/2 


2 b 


V 2 b J 


rjb 

2 C 


q P + cr £ 


assuming 2eq p as/2 b < 1 (which we can do, as the bound would otherwise be 
void anyway). This completes the proof. □ 


3.2 Authenticity of NORX 

Theorem 2. Let LI = (£, V) be NORX based on an ideal underlying primitive 
p. Then, 


\a auth/ \ , {% + os + °v) 2 , f8eq p a £ \ 1/2 rq p 

Ad v n (q p ,q £ ,Xs,qv,Xv) < ~ h + ( ~ h J + + 


q P 


2 b 

■ CT£ + (7t> 
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(<h 


\p 


2 b 
- ag 


cry)ay qy_ 
2 C + 2 r ’ 


where as , cry are defined in m i 


The bound is more complex than the one of Theorem [lj but intuitively implies 
that NORX offers integrity as long as it offers privacy and the number of forgery 
attempts ax> is limited, where the total complexity q p + as + cry should not 
exceed 2 c /ctd. See Table [1] for the security level for the various parameter choices 
of NORX. Needless to say, the exact bound is more fine-grained. 

Proof. We consider any adversary A that has access to (p^, £k, Pr) and at- 
tempts to make Vk output a non- J_ value. As in the proof of Theorem [lj we 
apply a PRP-PRF switch to find 


Adv% th (A) = Pr ( A pi 

(A fj 


< Pr I 


,SkPk 

,£kPk 


forges) 

forges) 


+ 


(gp + eg + £pf 

2 b +i 


(8) 
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Then we focus on A having oracle access to (f ± ,SK, As before, we assume 
without loss of generality that the adversary only makes full-block queries. 

We inherit terminology from Theorem [lj The state values corresponding to 
encryption and decryption queries will both be labeled (j, fc), where j indicates 
the query and k the state value within the j th query. If needed we will add 
another parameter 5 G {D, 8} to indicate that a state value ssj,k is in the j th 
query to oracle 5, for 5 G {V, 8} and j G {1, . . . , qs}. Particularly, this means we 
will either label the state values as in ([5j) with a 5 appended to the subscript, or 
simply as 0 < 5 j,i, • • • , ssj i(TSJ ). 

As before, we employ the collision events guess and hit, but expanded to the 
new notation with 5 = 8. Next, we define two P-related collision events Dguess 
and Dhit. Let i G {1 , . . . , g p }, (£>, j, k) be a decryption query index, and (5', j', k') 
be an encryption or decryption query index: 


£>guess(z; j, k) = x { = ST>j lk , 

Vh\t(j,k]6',j',k') = parent 7 ^ parent A s v ,j,k = S5'j',k' , 


We write Pguess = Pguess (i\j, k) and hit = \Zj,k;5',j',k' T>W\t (j, k ; 5\j\ fc'), 

and define event = guess V hit V Dguess V Dhit. 

Observe that from (|8j) we get 


Pr (^A f± ’ £K ’ T>K 



< Pr ^A^ ±,£k,t>k forges | -ievent j + 
Pr 0^ ' £k ’ t>k sets eventj . 


(9) 


A bound on the probability that A sets event is derived in Lemma [3] 

The remainder of this proof centers on the probability that A forges given 
that event does not happen. Such a forgery requires that [f{s t ^ g -)] T = Aj for 
some decryption query j. By -ievent, we know that is a new state value 
for all j G {1 , . . . , #x>}, hence /’ s output under is independent of all other 
values and uniformly distributed for all j. As a result, we know that the j th 
forgery attempt is successful with probability at most l/2 r . Summing over all 
qx> queries, we get 


Pr 


(A f± 


,£k,X) k 


forges 


-ievent 


qv 

— 2 r 5 


and the proof is completed via (l8l9l) and the bound of Lemma 02 where we again 


use that 


q P V£ +crf/2 (■ q p + ag + ay) 2 


2 b 


26+ 1 


□ 


Lemma 3. P r (_4/*.^ sets event) < M£±fMl + + ^ + 

%_+_&£+_ ay (q p +cr g )crx» + Pp/2 
2« + 2 C 

The proof of Lemma [3] is given in the full version of this paper [28] . 
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4 Other CAESAR Submissions 

In this section we discuss how the mode security proof of NORX generalizes 
to the CAESAR submissions Ascon, the BLNK mode underlying CBEAM/ 
STRIBOB, ICEPOLE, Keyak, and two out of the three PRIMATEs. Before 
doing so, we make a number of observations and note how the proof can accom- 
modate small design differences. 

— NORX uses domain separation constants at all rounds, but this is not strictly 
necessary and other solutions exist. In the privacy and integrity proofs of 
NORX, and more specifically at the analysis of state collisions caused by a 
decryption query in Lemma [3l the domain separations are only needed at the 
transitions between variable- length inputs, such as header to message data 
or message to trailer data. This means that the proofs would equally hold if 
there were simpler transitions at these positions, such as in Ascon. Alterna- 
tively, the domain separation can be done by using a different primitive, as 
in GIBBON and HANUMAN, or a slightly more elaborated padding, as in 
BLNK, ICEPOLE, and Keyak; 

— The extra permutation evaluations at the initialization and finalization of 
NORX are not strictly necessary: in the proof we consider the monotone 
event that no state collides assuming no earlier state collision occurred. For 
instance, in the analysis of VW\t in the proof of Lemma [3j we necessarily 
have a new input to p at some point, and consequently all next inputs to p 
are new (except with some probability); 

— NORX starts by initializing the state with init(iL, N) = (K\\N\\O b ~ K ~ u ) ® 
const for some constant const and then permuting this value. Placing the key 
and nonce at different positions of the state does not influence the security 
analysis. The proof would also work if, for instance, the header is preceded 
with K 1 1 TV or a properly padded version thereof and the starting state is 0 6 ; 

— In a similar fashion, there is no problem in defining the tag to be a different 
r bits of the final state; for instance, the rightmost r bits; 

— Key additions into the capacity part after the first permutation are harm- 
less for the mode security proof. Particularly, as long as these are done at 
fixed positions, these have the same effect as XORing a domain separation 
constant. 

These five modifications allow one to generalize the proof of NORX to Ascon, 
CBEAM and STRIBOB, ICEPOLE, Keyak, and two PRIMATEs, GIBBON 
and HANUMAN. The only major difference lies in the fact none of these designs 
accommodates a trailer, hence all are functions of the form 

(C, A) <— £ K (N; H, M ) and M/_L <— V K (N ; H, C\ A ) , 

except for one instance of ICEPOLE which accommodates a secret message 
number. Additionally, these designs have as < Xs + q$ for S E {£>,£} (or as < 
A s + 2 qs for CBEAM/STRIBOB). We always write H = (Hi , . . . , H u ) and M = 
(Mi, . . . , M v ) whenever notation permits. In below sections we elaborate on these 
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designs separately, where we slightly deviate from the alphabetical order to suit 
the presentation. Diagrams of all modes are given in Figure [2] The parameters 
and achieved provable security levels of the schemes are given in Table [TJ 

4.1 Ascon 

Ascon is a submission by Dobraunig et al. HD and is depicted in Figure l2al It 
is originally defined based on two permutations p\ , P 2 that differ in the number 
of underlying rounds. We discard this difference, considering Ascon with one 
permutation p. 

Ascon initializes its state using init that maps (K,N) to (O b ~ K ~ u \\K\\N) 0 
const, where const is determined by some design-specific parameters set prior to 
the security experiment. The header and message can be of arbitrary length, and 
are padded to length a multiple of r bits using 10*-padding. An XOR with 1 sep- 
arates header processing from message processing. From the above observations, 
it is clear that the proofs of NORX directly carry over to Ascon. 

4.2 ICEPOLE 

ICEPOLE is a submission by Morawiecki et al. [15] and is depicted in Figure l2cl 
It is originally defined based on two permutations, p\ and P 2 , that differ in the 
number of underlying rounds. We discard this difference, considering ICEPOLE 
with one permutation p. 

ICEPOLE initializes its state as NORX does, be it with a different con- 
stant. The header and message can be of arbitrary length, and are padded as 
follows. Every block is first appended with a frame bit: 0 for header blocks 
Hi , . . . ,H u -i and message block M V1 and 1 for header block H u and message 
blocks Then, the blocks are padded to length a multiple of r 

bits using 10*-padding. In other words, every padded block of r bits contains at 
most r — 2 data bits. This form of domain separation using frame bits suffices 
for the proof to go through. One variant of ICEPOLE also allows for a secret 
message number M secret , which consists of one block and is encrypted prior to 
the processing of the header, similar to the message. As this secret message 
number is of fixed length, no domain separation is required and the proof can 
easily be adapted. From above observations, it is clear that the proofs of NORX 
directly carry over to ICEPOLE. Without going into detail, we note that the 
same analysis can be generalized to the parallelized mode of ICEPOLE jl5] . 

4.3 Keyak 

Keyak is a submission by Bertoni et al. [16]. The basic mode for the serial case is 
depicted in Figure [2dJ yet due to its hybrid character it is slightly more general 
in nature. It is built on top of Sponge Wrap 0. 

Keyak initializes its state by 0 6 , and concatenates A, AT, and H using a special 
padding rule: 

H pad (A, 7V,tf) = keypack(A,240) || enc 8 (l) || enc 8 (0) || N || H, 
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Fig. 2. CAESAR submission modes discussed in Section [3] 
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where encg(x) is an encoding of x as a byte and key pack (K,£) = encs(^/8) ||^|| 
10 -«-imod (£-s) m The key-nonce-header combination H pa d(A, AT, H) and mes- 
sage M can be of arbitrary length, and are padded as follows: first, every block is 
appended with two frame bits, being 00 for header blocks (H pa d(A, TV, H)) i, . . . , 
(H pa d(iV, AT, H)) u -i and 01 for (H pa d(A, AT, H)) u , and 11 for message blocks 
Mi, . . . , M v - 1 and 10 for M v . Then, the blocks are padded to length a multi- 
ple of r bits using 10* 1-padding. In other words, every padded block of r bits 
contains at most r — 2 data bits. This form of domain separation using frame 
bits suffices for the proof to go through. Due to above observations, our proof 
readily generalizes to Sponge Wrap [9j and Duplex Wrap [TB] , and thus to Keyak. 
Without going into detail, we note that the same analysis can be generalized to 
the parallelized mode of Keyak US I- Additionally, Keyak also supports sessions, 
where the state is re-used for a next evaluation. Our proof generalizes to this 
case, simply with a more extended description of CD- 

4.4 BLNK (CBEAM and STRIBOB) 

CBEAM and STRIBOB are submissions by Saarinen mmmm- Minaud 
identified an attack on CBEAM [12], but we focus on the modes of operation. 
Both modes are based on the BLNK Sponge mode [18], which is depicted in 
Figure l2bl 

The BLNK mode initializes its state by 0 6 , compresses K into the state (using 
one or two permutation calls, depending on k), and does the same with TV. 
Then, the mode is similar to Sponge Wrap 0, though using a slightly more 
involved domain separation system similar to the one of NORX. Due to above 
observations, our proof readily generalizes to BLNK [18], and thus to CBEAM 
and STRIBOB. 

4.5 PRIMATEs: GIBBON and HANUMAN 

PRIMATEs is a submission by Andreeva et al. m, and consists of three al- 
gorithms: APE, GIBBON, and HANUMAN. The APE mode is the more ro- 
bust one, and significantly differs from the other two, and from the other CAE- 
SAR submissions discussed in this work, in the way that ciphertexts are de- 
rived and because the mode is secure against nonce misusing adversaries up to 
common prefix [27]. We now focus on GIBBON and HANUMAN, which are 
depicted in Figures l2el and l2fl GIBBON is based on three related permutations 
P = (PuP 2 ,P 3 )i where the difference in p 2 ,Ps is used as domain separation of 
the header compression and message encryption phases (the difference of pi from 
(P 2 ,P 3 ) is irrelevant for the mode security analysis). Similarly, HANUMAN uses 
two related permutations p = {pi,P 2 ) for domain separation. 

GIBBON and HANUMAN initialize their state using in it that maps (A, TV) 
to 0 b ~ K ~ u \\ K\\N. The header and message can be of arbitrary length, and are 
padded to length a multiple of r bits using 10*-padding. In case the true header 
(or message) happens to be a multiple of r bits long, the 10*-padding is con- 
sidered to spill over into the capacity. From above observations, it is clear that 
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the proofs of NORX directly carry over to GIBBON and HANUMAN. A small 
difference appears due to the usage of two different permutations: we need to 
make two PRP-PRF switches for each world. Concretely this means that the first 
term in Theorem [T] becomes 5 ^b+i g ^ and the first term in Theorem [2] becomes 

3(<? p +<T£+ctx>) 2 
2 b + 1 

5 Conclusions 

In this work we analyzed one of the Sponge-based authenticated encryption 
designs in detail, NORX, and proved that it achieves security of approximately 
min{2 6 / 2 , 2 C , 2^}, significantly improving upon the traditional bound of 
min{2 c / 2 , 2 K }. Additionally, we showed that this proof straightforwardly gen- 
eralizes to five other CAESAR modes, Ascon, BLNK (of CBEAM/STRIBOB), 
ICEPOLE, Keyak, and PRIMATEs. Our findings indicate an overly conserva- 
tive parameter choice made by the designers, implying that some designs can 
improve speed by a factor of 4 at barely any security loss. 

It is expected that the security proofs also generalize to the modes of Artemia 
[211 and 7r-Cipher [22] . However, they deviate slightly more from the other de- 
signs. Artemia is based on the JH hash function [2_9 and XORs data blocks in 
both the rate and capacity part. It does not use domain separations, rather it 
encodes the lengths of the inputs into the padding at the end [3CU Therefore, a 
generalization of the proof of NORX to Artemia is not entirely straightforward. 
7r-Cipher, on the other hand, is structurally different in the way it maintains 
state. A so-called “common internal state” is used throughout the evaluation. 
For the processing of the header (or similarly the message) the state is forked 
into u chains to process Hi, . . . , H u in parallel, resulting in u tag values, which 
are added into the common internal state. Due to this design property, the devi- 
ation of 7r-Cipher from NORX is too large to simply claim that the proof carries 
over. 

The results in this work are derived in the ideal permutation model, where the 
underlying primitive is assumed to be ideal. We acknowledge that this model does 
not perfectly reflect the properties of the primitives. For instance, it is stated by 
the designers of Ascon, NORX, and PRIMATEs that non-random (but harmless) 
properties of the underlying permutation exist. Furthermore, it is important 
to realize that the proofs of security for the modes of operation in the ideal 
model do not have a direct connection with security analysis performed on the 
permutations, as is the case with block ciphers modes of operation. Nevertheless, 
we can use these proofs as heuristics to guide cryptanalysts to focus on the 
underlying permutations, rather than the modes themselves. 
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A Proof Technique Used in Lemma [2] 

Formally, the proof technique used in Lemma [2] relies on the following paradigm. 

Note that there is an ordering of the q p + erg primitive queries, and we can 

reformulate guess(^), hit(£), key(^), and multi(^) for i = 1, . . . , q p +<j£ analogously. 

Defining event(^) = guess(^) V W\t(£) and help(^) = key(£) V multi(^), then 

Pr (event) ^ Pr (event((j , p - l - (T£) | 'event(l * * • q p -\- & s — 1) A ^help(l ■ ■ * q P ~\~ & £ ')') -l - 
Pr (event(l • • • q p +a £ -l) V help(l • • • q p +a £ )) , 

and inductively Pr (event) < Y^t=\ S P r (event(^) | -ievent(l • • • £ — 1) A 

— <hel p(l • • -£)) + Pr (help(^) | — >hel p(l • • • £ — 1)). This formulation would however 

merely reduce the readability of the proof. 
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Abstract. Scenarios in which authenticated encryption schemes out- 
put decrypted plaintext before successful verification raise many secu- 
rity issues. These situations are sometimes unavoidable in practice, such 
as when devices have insufficient memory to store an entire plaintext, 
or when a decrypted plaintext needs early processing due to real-time 
requirements. We introduce the first formalization of the releasing unver- 
ified plaintext (RUP) setting. To achieve privacy, we propose using plain- 
text awareness (PA) along with IND-CPA. An authenticated encryption 
scheme is PA if it has a plaintext extractor , which tries to fool adversaries 
by mimicking the decryption oracle, without the secret key. Releasing un- 
verified plaintext to the attacker then becomes harmless as it is infeasible 
to distinguish the decryption oracle from the plaintext extractor. We in- 
troduce two notions of plaintext awareness in the symmetric-key setting, 
PA1 and PA2, and show that they expose a new layer of security be- 
tween IND-CPA and IND-CCA. To achieve integrity, INT-CTXT in the 
RUP setting is required, which we refer to as INT-RUP. These new se- 
curity notions are compared with conventional definitions, and are used 
to make a classification of symmetric-key schemes in the RUP setting. 
Furthermore, we re-analyze existing authenticated encryption schemes, 
and provide solutions to fix insecure schemes. 


1 Introduction 

The goal of authenticated encryption (AE) is to simultaneously provide data 
privacy and integrity. AE decryption conventionally consists of two phases: plain- 
text computation and verification. As reflected in classical security models, plain- 
text coming from decryption is output only upon successful verification. 
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Nevertheless, there are settings where releasing plaintext before verification 
is desirable. For example, it is necessary if there is not enough memory to store 
the entire plaintext [21] or because real-time requirements would otherwise not 
be met [T51I38] . Even beyond these settings, using dedicated schemes secure 
against the release of unverified plaintext can increase efficiency. For instance, 
to avoid releasing unverified plaintext into a device with insecure memory mi 
the two-pass Encrypt-then-MAC composition can be used: a first pass to verify 
the MAC, and a second to decrypt the ciphertext. However, a single pass AE 
scheme suffices if it is secure against the release of unverified plaintext. 

If the attacker cannot observe the unverified plaintext directly, it may be 
possible to determine properties of the plaintext through a side channel. This 
occurs, for example, in the padding oracle attacks introduced by Vaudenay [39] , 
where an error message or the lack of an acknowledgment indicates whether 
the unverified plaintext was correctly padded. Canvel et al. [18] showed how 
to mount a padding oracle attack on the then-current version of OpenS SL by 
exploiting timing differences in the decryption processing of TLS. As shown by 
Paterson and AlFardan urn for TLS and DTLS, it is very difficult to prevent 
an attacker from learning the cause of decryption failures. 

The issue of releasing unverified plaintext has also been acknowledged and ex- 
plicitly discussed in the upcoming CAESAR competition^ “ Beware that security 
questions are raised by any authenticated cipher that handles a long ciphertext 
in one pass without using a large buffer: releasing unverified plaintext to applica- 
tions often means releasing it to attackers and also requires an analysis of how 
the applications will react.” 

For several AE schemes, including OCB [27] . AEGIS PTT] . ALE T5j, and 
fides na, the designers explicitly stress that unverified plaintext cannot be 
released. Although the issue of releasing unverified plaintext (RUP) in AE is 
frequently discussed in the literature, it has largely remained unaddressed even 
in recent AE proposals, likely due to a lack of comprehensive study. 

We mention explicitly that we do not recommend omitting verification, which 
remains essential to preventing incorrect plaintexts from being accepted. To en- 
sure maximal security, unverified plaintext must be kept hidden from adversaries. 
However, our scenario assumes that the attacker can see the unverified plaintext, 
or any information relating to it, before verification is complete. Furthermore, is- 
sues related to the behavior of applications which process unverified plaintext are 
beyond the scope of this paper; careful analysis is necessary in such situations. 


1.1 Security Under Release of Unverified Plaintext 

AE security is typically examined using indistinguishability under chosen plain- 
text attack (IND-CPA) for privacy and integrity of ciphertexts (INT-CTXT) for 
integrity, and a scheme which achieves both is indistinguishable under chosen 
ciphertext attack (IND-CCA), as shown by Bellare and Namprempre [9] and 
Katz and Yung [26] . However, in the RUP situation adversaries can also observe 


1 http : / / competitions . cr . yp . to/features . html 
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Fig. 1 . The two plaintext aware settings (PA1 and PA2) used in the paper, where 
D is an adversary. Not shown in the figure is the type of IV used by the 8 k oracle 
(cf. Sect. 13.21) . Left: Real world, with encryption oracle 8k and decryption oracle T>k- 
Right: Simulated world, with encryption oracle 8k and plaintext extractor E. The 
plaintext extractor E is a stateful algorithm without knowledge of the secret key K , 
nor access to the encryption oracle 8k- The dotted line indicates that E has access to 
the encryption queries made by adversary D, which only holds in the PA1 setting. 


unverified plaintext, which the conventional definitions do not take into account. 
To address this gap we introduce two definitions: integrity under releasing un- 
verified plaintext (INT-RUP) and plaintext awareness (PA). For integrity we 
propose using INT-RUP and for privacy both IND-CPA and PA. In the full ver- 
sion of this paper [3], we discuss how the combination of INT-RUP, IND-CPA, 
and PA measures the impact of releasing unverified plaintext on security. 

INT-RUP. The goal of an adversary under INT-CTXT is to produce new ci- 
phertexts which pass verification, with only access to the encryption oracle. 
We translate INT-CTXT into the RUP setting, called INT-RUP, by allowing 
the adversary to observe unverified plaintexts. We formalize this by separating 
plaintext computation from verification, and giving the adversary access to a 
plaintext-computing oracle. 

Plaintext Awareness (PA). We introduce PA as a new symmetric-key notion 
to achieve security in the RUP setting. Informally, we define a scheme to be PA 
if the adversary cannot gain any additional knowledge about the plaintext from 
decryption queries besides what it can derive from encryption queries. 

Our PA notion only involves encryption and decryption, and can thus be 
defined both for encryption schemes as well as for AE schemes that release 
unverified plaintext. 

At the heart of our new PA notion is the plaintext extractor , shown in Fig.[lJ 
We say that an encryption scheme is PA if it has an efficient plaintext extractor, 
which is a stateful algorithm that mimicks the decryption oracle in order to fool 
the adversary. It cannot make encryption nor decryption queries, and does not 
know the secret key. We define two notions of plaintext awareness: PA1 and PA2. 
The extractor is given access to the history of queries made to the encryption 
oracle in PA1, but not in PA2. Hence PA1 is used to model RUP scenarios in 
which the goal of the adversary is to gain knowledge beyond what it knows from 
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Fig. 2. Implications and separations between the IND-CPA, IND-CPA+PA1, 
IND-CPA+PA2, IND-CCA, IND-CCA', PA2, and DI security notions (left) and 
INT-CTXT and INT-RUP (right). Dashed lines refer to relations that hold if the 
IV is random and thin solid lines in case of nonce or arbitrary IV. We use a thick solid 
line if the relation holds under all IV cases. 


the query history. For situations in which the goal of the adversary is to decrypt 
one of the ciphertexts in the query history, we require PA2. 

Relations among Notions. PA for public-key encryption was introduced by 
Bellare and Rogaway HH> and later defined without random oracles by Bellare 
and Palacio DUI- In the symmetric-key setting, our definition of PA is somewhat 
similar, however there are important technical differences which make the public- 
key results inapplicable to the symmetric-key setting. 

Relations among the PA and conventional security definitions for encryption 
(see Sect. 13.3 1) are summarized in Fig. O We consider three IV assumptions: 
random IV, nonce IV (non-repeating value), and arbitrary IV (value that can 
be reused), as explained in Sect. 13.21 The statements of the theorems and proofs 
can be found in the full version of this paper [3 . 

The motivation for having two separate notions, PA1 and PA2, is as fol- 
lows. As we prove in this work, if the plaintext extractor has access to the 
query history (PAl), then there are no implications between IND-CPA+PA1 
and IND-CCA. However, if we modify plaintext awareness so that the plaintext 
extractor no longer has access to the query history (PA2), then we can prove 
that IND-CPA+PA2 implies IND-CCA'. IND-CCA' is a strengthened version of 
IND-CCA, where we allow the adversary to re-encrypt the outputs of the decryp- 
tion oracle. Note that such a re-encryption is always allowed in the public-key 
setting, but not in the symmetric-key setting where the key required for encryp- 
tion is secret. Furthermore, we also prove that PA2 is equivalent to the notion of 
decryption independence (DI). DI captures the fact that encryption and decryp- 
tion under the same key are only related to each other as much as encryption 
and decryption under different keys. 

Finally, although INT-RUP clearly implies INT-CTXT, the opposite is not 
necessarily true. 

Motivating Examples. To get an intuition for PAl (shown in Fig. |T]) and 
how it relates to the RUP setting, we provide two motivating examples with 
CTR mode. For simplicity, we define the encryption function of CTR mode as 
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£x(IV, M) = .Ek'(IV) ® M, where the message M and the initialization value 
IV consist of one block each, and Ek is a block cipher with a secret key K. 
The corresponding decryption function is T>k( IV, C) = Eb^IV) ® C. As shown 
in [13], CTR mode is IND-CPA but not IND-CCA, a result that holds for nonce 
IVs (unique non-repeating values) as well as for random IVs. 

1. Nonce IV CTR mode is not PA1. Following Rogaway EH. we assume that 
an adversary is free to specify the IV for encryption and decryption queries, 
as long as it does not make two encryption queries with the same nonce IV, 
N. In the attack, an adversary first makes a decryption query (V, C) with 
nonce N and one-block ciphertext C to obtain a message M . The correct 
decryption of M is Ek(N) ® C as output by the decryption oracle. The 
adversary then computes the keystream k := M® C. Now in a second query 
(V, M'), this time to the encryption oracle, the adversary obtains C' where 
C' mM'e K. 

The scheme fails to be plaintext aware as it is infeasible for any plaintext 
extractor to be consistent with subsequent encryption queries. Specifically, 
the plaintext extractor cannot compute k at the time of the first decryption 
query for the following reasons: it does not know the secret key K , it is not 
allowed to do encryption queries, and an encryption query with N has not 
yet been recorded in the query history. 

2. Random IV CTR mode is PA1. In this setting, the IV used in encryption is 
chosen randomly by the environment, and therefore out of the attacker’s con- 
trol. However, the adversary can still freely choose the IV for its decryption 
queries. In this random IV setting, the attack in the nonce IV example does 
not apply. To see this, consider an adversary which queries the decryption 
of (IV i , (7) with a one-block ciphertext C. It can compute the keystream as- 
sociated to IV I, but does not control when IV \ is used in encryption. Thus, 
a plaintext extractor can be defined as outputting a random plaintext M in 
response to the (/Vf, C) query. 

But what if an adversary makes additional decryption queries with the same 
IV? Suppose the adversary makes decryption query (IVi, C ® A). Since the 
plaintext extractor is a stateful algorithm, it can simply output M ® A to 
provide consistency. Furthermore, if an adversary makes encryption queries, 
these will be seen by the PA1 plaintext extractor. Therefore, the plaintext 
extractor can calculate the keystream from these queries, and respond to 
any decryption queries in a consistent way. A proof that random IV CTR 
mode is PA1 is provided in Prop. [2] 

AE schemes such as GCM [28 and CCM |40, reduce to CTR mode in the RUP 
setting. This is because the adversary does not need to forge a ciphertext in 
order to obtain information about the corresponding (unverified) plaintext. By 
requiring that the underlying encryption scheme of an AE scheme is PA1, we 
ensure that the adversary does not gain any information from decryption queries, 
meaning no decryption query can be used to find an inconsistency with any past 
or future queries to the encryption or decryption oracles. 
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1.2 Analysis of Authenticated Encryption Schemes 

Given the formalization of AE in the RUP setting, we categorize existing AE 
schemes based on the type of IV used by the encryption function: random IV, 
nonce IV, and arbitrary IV. Then, we re-analyze the security of several recently 
proposed AE schemes as well as more established AE schemes. In order to do 
so, we split the decryption algorithms into two parts, plaintext computation and 
verification, as described in Sect. 13.11 

For integrity, we show that OCB [33] and COPA [4] succumb to attacks by 
using unverified plaintext to construct forgeries. For privacy an overview of our 
results can be seen in Table [lj where we also include the encryption-only modes 
CTR and CBC as random IV examples. We draw a distinction between the 
schemes that are online and the schemes that are not, where an online scheme 
is one that is able to produce ciphertext blocks as it receives plaintext blocks. 

Most of the schemes in Table [l] fail to achieve PA1. As a result, we demon- 
strate techniques to restore PA1 for nonce IV and arbitrary IV schemes. For the 
former, we introduce the nonce decoy technique, and for the latter the PRF-to- 
IV method, which converts a random IV PA1 scheme into an arbitrary IV PA1 
scheme. For online arbitrary IV schemes, we demonstrate that PA1 security can 
be achieved only if the ciphertext is substantially longer than the plaintext, or 
the decryption is offline. We show that McOE-G [20] achieves PA1 if the plain- 
text is padded so that the ciphertext becomes twice as long. We also prove that 
APE [2], an online deterministic AE scheme with offline decryption, achieves 
PA1. 

Finally we show that the nonce decoy preserves INT-RUP, and the PRF-to-IV 
method turns any random IV scheme into an INT-RUP arbitrary IV scheme. 

1.3 Background and Related Work 

The definition of encryption and AE has been extended and generalized in dif- 
ferent ways. In 2004, Rogaway [32] introduced nonce IV encryption schemes, in 
contrast with prior encryption modes that used a random IV, as in the CBC 
mode standardized by NIST in 1980 [29] . 

Rogaway and Shrimpton [34] formalized deterministic AE (DAE), where an IV 
input is optional and can therefore take arbitrary values. Secure DAE differs from 
secure nonce IV AE schemes in that DAE privacy is possible only up to message 
repetition, namely an adversary can detect repeated encryptions. Unfortunately, 
DAE schemes cannot be online. To resolve this issue, Fleischmann et al. [20] 
explored online DAE schemes, where privacy holds only up to repetitions of 
messages with identical prefixes or up to the longest common prefix. 

Tsang et al. [38] gave syntax and security definitions of AE for streaming data. 
Bellare and Keelveedhi [6] considered a stronger security model where data may 
be key-dependent. Boldyreva et al. reformulated AE requirements and properties 
to handle ciphertext fragmentation in m, and enhanced the syntax and security 
definitions so that the verification oracle is allowed to handle multiple failure 
events in PH- Our formalization can be interpreted as a special case of the work 
in m yet the emphasis and results differ. 
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Table 1 . PA1 and PA2 security of deterministic and non-deterministic schemes, sep- 
arated as described in Sect. 13.11 In the columns for PA1 and PA2, / means secure 
(there exists an extractor), and X means insecure (there exists an attack). Proofs for 
the security results in this table can be found in Sect. [5] 


IV type 

Online 

Scheme 

PA1 

PA2 

Remark 

random 

/ 

CTR, CBC [29] 

/ 

X 


nonce 

/ 

OCB HH 

X 

X 



/ 

GCM [28], Sponge Wrap [14] 

X 

X 



X 

ccm m 

X 

X 

not online 35 

arbitrary 

/ 

COPA 0] 

X 

X 

privacy up to prefix 


/ 

McOE-G [20] 

X 

X 

// 


/ 

APE 0] 

/ 

X 

//, backwards decryption 


X 

SIV 0U, BTM 03], HBS 01] 

/ 

X 

privacy up to repetition 


X 

Encode-then-Encipher 12 

/ 

/ 

//, VIL SPRP, padding 


2 Preliminaries 

Symbols. Given two strings A and B in {0,1}*, we use A\\B and AB inter- 
changeably to denote the concatenation of A and B. The symbol 0 denotes 
the bitwise XOR operation of two strings. Addition modulo 2 n is denoted by +, 
where n usually is the bit length of a block. For example, in the CTR mode of op- 
eration of a block cipher, we increment the IV value by addition IV +i (mod 2 n ), 
where n is the block size, the n-bit string IV = IV n -\ • • • IV±IVq G {0, l} n is 

converted to an integer 2 n ~ 1 IV n -i H \-2IV\ + /Vo G {0, 1, . . . , 2 n — 1}, and the 

result of addition is converted to an n-bit string in the reverse way. By K G- K 
we mean that K is chosen uniformly at random from the set K. All algorithms 
and adversaries are considered to be “efficient” . 

Adversaries and Advantages. An adversary is an oracle Turing machine. Let 
ED be some class of computationally bounded adversaries; a class ED can consist 
of a single adversary D, i.e. ED = {D}, in which case we simply write D instead 
of ED. For convenience, we use the notation 

A (/ ; g) := sup |Pr[D^ = 1] - Pr[D» = 1] | 

© dgd 

to denote the supremum of the distinguishing advantages over all adversaries 
distinguishing oracles / and g , where the notation V>° indicates the value output 
by D after interacting with oracle O. The probabilities are defined over the 
random coins used in the oracles and the random coins of the adversary, if 
any. Multiple oracles are separated by a comma, e.g. A(/i ,/2 ; # 1 ,^ 2 ) denotes 
distinguishing the combination of /1 and from the combination of g\ and g 2 . 

If D is distinguishing (/i,/ 2 , from (gi,g 2 , ■ ■ ■ ,gk ), then by we mean 

the ith oracle that D has access to, i.e. either fi or gi depending upon which 
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oracles it is interacting with. By Oi Oj we describe a set of actions that D 
can perform: first D queries Oi, and then at some point in the future D queries 
Oj with the output of Oi, assuming the output of Oi can be used directly as the 
input for Oj. If the oracles Oi and Oj represent a family of algorithms indexed 
by inputs, then the indices must match. For example, say that £^ ,A and V^ A 
are families indexed by (TV, A). Then £k H k describes a set of actions, which 
includes querying £^- ,A (M) to receive C, and then at some point in the future 
querying V^ A {C), where K, N, A , and C are reused. 

Our security definitions follow [9] and are given in terms of adversary ad- 
vantages. A scheme is said to be secure with respect to some definition if it is 
negligible with respect to all adversaries with time complexity polynomial in 
the security parameter. As in [5], positive results are given as explicit bounds, 
whereas negative results, i.e. separations, are given in asymptotic terms, which 
can easily be converted into concrete bounds. 

Online Functions. A function / : M — >> C is said to be n-online if there exist 
functions fi : {0, l } 2 -A {0, l} Ci and f[ : {0, l } 2 -A {0, l} Ci such that c* > 0, and 
for all M G M we have 

f(M) = f n (Mi) f 2 n(M 1 M 2 ) • • • f jn (M\M 2 • • • Mj) f [ M , (M) , 

where j = [(\M\ — l)/nj and Mi is the ith n-bit block of M. Often we just say 
/ is online if the value n is clear from context. 

3 AE Schemes: Syntax, Types, and Security 

3.1 New AE Syntax 

A conventional AE scheme 17 = (£,T>) consists of an encryption algorithm £ 
and a decryption algorithm V: 

M/L^V^' A (C,T) , 

where K E K is a key, IV G IV an initialization value, A G A associated data, 
M G M a message, C G C the ciphertext, T G T the tag, and each of these sets is 
a subset of {0, 1}*. The correctness condition states that for all K , IV, A , and 
M, V i t ^ ,A (£^ ,A (M)) = M. A secure AE scheme should return _L when it does 
not receive a valid (C,T) tuple. 

In order to consider what happens when unverified plaintext is released, we 
disconnect the decryption algorithm from the verification algorithm so that the 
decryption algorithm always releases plaintext. A separated AE scheme is a 
triplet 17 = (£,V,V) of keyed algorithms — encryption £, decryption V, and 
verification V — such that 

(C, T) <— £ I ] ^ r,A {M) , 

M ^V^’ a {C,T) , 

T/±^V i k v ’ a (C,T) , 


How to Securely Release Unverified Plaintext in Authenticated Encryption 


113 


where K , IV, A, M,C, and T are defined as above. Note that in some determin- 
istic schemes IV may be absent, in which case we can expand the interface of 
such schemes to receive IV input with which it does nothing. Furthermore, for 
simplicity we might omit A if there is no associated data. The special symbols 
T and _L indicate the success and failure of the verification process, respectively. 

As in the conventional setting we impose a correctness condition : for all K, 
IV, A, and M such that £^' A {M) = ( C,T ), we require Vg’ A (C,T) = M and 
V i /’ a (C,T)= T. 

Relation to Conventional Syntax. Given a separated AE scheme 77 = 
(£, V, V), we can easily convert it into a conventional AE scheme 77 = {£, V). Re- 

jy A 

member that the conventional decryption oracle V K ’ (C,T) outputs M where 
M = V\p A {C, T ) if VUAC T ) = T > and _L otherwise. 

The conversion in the other direction is not immediate. While the verification 
algorithm V can be easily “extracted” from V (i.e., one can easily construct V 
using V — just replace M with T), it is not clear if one can always “naturally” 
extract the decryption algorithm V from V. However, all practical AE schemes 
that we are aware of can be constructed from a triplet {£, V , V) as above, and 
hence their decryption algorithms V are all separable into V and V. 


3.2 Types of AE Schemes 

Classification Based on IVs. In order to achieve semantic security [22] . AE 
schemes must be probabilistic or stateful [5.. Usually the randomness or state 
is focused into an IV [32]. How the IV is used restricts the scheme’s syntax and 
the types of adversaries considered in the security notions: 

1. Random IV. The environment chooses a random IV for each encryption, 
thus an adversary has no control over the choice of IV for each encryption. 
The generated IV must be sent along with the ciphertext so that the receiver 
can decrypt. 

2. Nonce IV. A distinct IV must be used for each encryption, thus an ad- 
versary can choose but does not repeat nonce IV values in its encryption 
queries. How the parties synchronize the nonce is typically left implicit. 

3. Arbitrary IV. No restrictions on the IV are imposed, thus an adversary 
may choose any IV for encryption. Often a deterministic AE scheme does 
not even have an IV input, in which case an IV can be embedded into the 
associated data A , which gets authenticated along with the plaintext M but 
does not get encrypted; A is sent in the clear. 

In all IV cases the adversary can arbitrarily choose the IV input values to the 
decryption oracle. In some real-world protocols the decryption algorithm can be 
stateful [7], but such schemes are out of the scope of this paper, and schemes 
designed to be secure with deterministic decryption algorithms will be secure in 
those settings as well. 

While random and nonce IV schemes can achieve semantic security, arbitrary 
IV schemes cannot, and therefore reduce to deterministic security. In the lat- 
ter case, the most common notions are “ privacy up to repetition ” which is used 
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Table 2. The type of random oracle needed depending upon the class of AE scheme 
considered 


IV type 

type of encryption 

online 

offline 

random 

random oracle 

random oracle 

nonce 

random oracle 

random oracle 

arbitrary 

random-up-to-prefix oracle 

random-up-to-repetition oracle 


for DAE [34] and “ privacy up to prefix which is used for authenticated online 
encryption [20]. In any case, we write $ to indicate the ideal oracle from which 
an adversary tries to distinguish the real encryption oracle Sk, regardless of the 
IV type. This means that the ideal $ oracle should be either the random or- 
acle, random-up-to-repetition oracle, or random-up-to-prefix oracle, depending 
upon the IV. Each of the cases with their respective random oracles are listed in 
Table [2] In order to avoid redundancy in the wording of the definitions, when- 
ever we write A (£k> • • • ;$,•••)? ^ is understood that the $ oracle is the one 
appropriate for the AE scheme consisting of £. 

Online Encryption/Decryption Algorithms. A further distinction is made 
between online schemes and the others. An AE scheme with online encryption 
is one in which the ciphertext can be output as the plaintext is received, namely 
we require that for each (A, /V, A) the resulting encryption function is online as 
a function of the plaintext M. 

Although decryption in AE schemes can never be online due to the fact that 
the message needs to be verified before it is output, we still consider schemes 
which can compute the plaintext as the ciphertext is received. In particular, a 
scheme with online decryption is one in which this plaintext-computing algo- 
rithm, viewed as a function of the ciphertext and tag input, is online. Note that 
in some schemes the tag could be received before the ciphertext, in which case 
we still consider V to be online (even though our new syntax implies that the 
tag is always received after the ciphertext). 


3.3 Conventional Security Definitions under the New Syntax 

Let II = (£ , D, V) denote an AE scheme as a family of algorithms indexed by 
the key, IV, and associated data. With the new separated syntax we reformulate 
the conventional security definitions, IND-CPA, IND-CCA, and INT-CTXT. As 
mentioned above, the security notions are defined in terms of an unspecified $, 
where the exact nature of $ depends on the type of IV allowed (cf. Table [2]). 
In the definitions the only fixed input to the algorithms is the key, indicated by 
writing £k and T>k] all other inputs, such as the IV and associated data, can be 
entered by the adversary. 

Definition 1 (IND-CPA Advantage). Let D be a computationally bounded 
adversary with access to one oracle O, and let K A K. Then the IND-CPA 
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advantage of D relative to II is given by 

CPAij(D) := A (£k 5 $) • 

D 

Definition 2 (IND-CCA Advantage). Let D be a computationally bounded 
adversary with access to two oracles 0\ and O 2 , such that D never queries 
0\ O 2 nor O 2 °A Oi, and let K A K. Then the IND-CCA advantage of D 
relative to II is given by 

CCAtj(D) := N{Sk,D k ; %,Vk) - 

D 

Note that IND-CCA as defined above does not apply to the random IV setting. 
When a random IV is used, the adversary is not prohibited from querying O 2 
0\. We introduce a version of IND-CCA below, which can be applied to all IV 
settings. 

Definition 3 (IND-CCA' Advantage). Let D be an adversary as in Def. [H 
except D may now query O 2 0\, and let K A K. Then the IND-CCA' 
advantage of D relative to 77 is given by 

CCA'ij(D) := A(£k, Dr ; $, Dk) • 

D 

Definition 4 (INT-CTXT Advantage). Let F be a computationally bounded 
adversary with access to two oracles £k and Vk, such that F never queries 
£k ‘-a Vk- Then the INT-CTXT advantage of F relative to 77 is given by 

CTXT n (F) := Pr [F £k ' Vk forges] , 

where the probability is defined over the random key K and random coins of F. 
Here, “forges” means that Vk returns T to the adversary. 

4 Security under Release of Unverified Plaintext 

4.1 Security of Encryption 

We introduce the notion of plaintext-aware encryption of symmetric-key encryp- 
tion schemes. An analysis of existing plaintext- aware schemes can be found in 
Sect. [5] The formalization is similar to the one in the public-key setting [10]. Let 
77 = {£ , V) denote an encryption scheme. 

Definition 5 (PA1 Advantage). Let D be an adversary with access to two 
oracles 0\ and O 2 - Let E be an algorithm with access to the history of queries 
made to G\ by D ; called a PAl-extractor. We allow E to maintain state across 
invocations. The PA1 advantage o/D relative to E and 77 is 

PAlg(D) s* A(Sk,T> k ; £k,V) , 

D 

where K A K, and the probability is defined over the key K , the random coins 
of D, and the random coins o/E. 
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The adversary D tries to distinguish the case in which its second oracle 0 2 is 
given by Vk versus the case in which 0 2 is given by E. The task of E is to 
mimic the outputs of Vk given only the history of queries made to £k by D 
(the key is not given to E). Note that D is allowed to make queries of the form 
£k ^ E; these can easily be answered by E via the query history. 

PA2 is a strengthening of PA1 where the extractor no longer has access to 
the query history of £k\ the extractor becomes a simulator for the decryption 
algorithm. Note that in order for this to work, we cannot allow the adversaries 
to make queries of the form £k E. 

Definition 6 (PA2 Advantage). Let D be an adversary as in Def. 0 with 
the added restriction that it may not ask queries of the form 0\ O 2 - Let E 

be an algorithm , called a PA2-extractor. We allow E to maintain state across 
invocations. The PA 2 advantage ofD relative to E and U is 

PA2%(V):=A(Sk,V k ;£ k ,E) , 

D 

where K A K, and the probability is defined over the key K , the random coins 
of~D, and the random coins o/E. 

An equivalent way of describing PA2 is via decryption independence (DI), which 
means that the adversary cannot distinguish between encryption and decryption 
under the same key and under different keys. The equivalence between PA2 and 
DI is proven in [3j. 

Definition 7 (Decryption Independence). Let D be a distinguisher accept- 
ing two oracles not making queries of the form 0\ O 2 , then the DI advantage 

of D relative to II is 


DItt(D) := A(£k,Dk ; £k,Dl) , 

D 

where K,L K are independent. 

4.2 Security of Verification 

Integrity when releasing unverified plaintext is a modification of INT-CTXT 
(Def. [4j) to include the decryption oracle as a means to obtain unverified plain- 
text. Let LI = [£ , V, V) be an AE scheme with separate decryption and verifica- 
tion. 

Definition 8 (INT-RUP Advantage). Let F be a computationally bounded 
adversary with access to three oracles £k, Dk, and Vk, such that F never queries 
£k Vk- Then the INT-RUP advantage of F relative to IT is given by 

INT-RUP/ 7 (F) := Pr [f £k ’ Vk ’ Vk forges] , 

where the probability is defined over the key K and random coins of F. Here, 
“forges” means the event of the oracle Vk returning T to the adversary. 
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5 Achieving Plaintext Awareness 

5.1 Why Existing Schemes Do Not Achieve PA1 

In conventional AE schemes such as OCB, GCM, SpongeWrap, CCM, COPA, 
and McOE-G, a ciphertext is computed using some bijective function, and then 
a tag is appended to the ciphertext. The schemes achieve AE because the tag 
prevents all ciphertexts from being valid. But if the tag is no longer checked, 
then we cannot achieve PA1, as explained below. 

Let 77 = (Sk^k) be a nonce or arbitrary IV encryption scheme, then we 
can describe 77 as follows, 

4M(M) = E t /’ a (M) || F^ a {M) , 

where Ek is length-preserving, i.e. \E^ A (M)\ = \M\. One can view 
as the tag-producing function from a scheme such as GCM. In the following 
proposition we prove that if 77 is IND-CPA and PAl, then Ek cannot be bi- 
jective for each (7V, A), assuming either a nonce or arbitrary IV. Note that the 
proposition only holds if 77 is a nonce or arbitrary IV scheme. 

Proposition 1. Say that Ek is bijective for all (7V, A), then there exists an 
adversary D such that for all extractors E, there exists an adversary Di such 
that 

1 - CPA^Da) < PA1^(D) , 

where D makes one 0\ query, one 0 2 query, and Di is as efficient as D plus 
one query to E. 

Proof See [3]. □ 

We conclude that in order for a nonce or arbitrary IV scheme to be PAl and 
IND-CPA, Ek must either not be bijective, or not be length-preserving. 

5.2 PAl Random IV Schemes 

We illustrate Def. [5] and the idea of an extractor by considering the CTR mode 
with a random IV. 

Example 1 (RIV-CTR Extractor). Let F : {0, l} k x {0, l} n -A {0, l} n be a PRF. 
For Mi G {0, l} n , 1 < i < £, define RIV-CTR encryption as 

£^°(M 1 ■ ■ ■ Me) = F k (C 0 + 1) © M 1 || • • • || F K (C 0 + £) ® M e , 

where Co is selected uniformly at random from {0, l} n for each encryption, and 
decryption as 

■■■C e ) = F k (C 0 + 1) © Ci || • • • || F k (Co + 1) © C £ . 

We can define an extractor E for RIV-CTR as follows. Initially, E generates a 
random key K' which it will use via TV'. Let (Co, C\ • • • Cf) denote an input to 
E. Using Co, the extractor searches its history for a ciphertext with Co as IV. 
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1. If such a ciphertext exists, we let (C[ • • • C' m , M[ • • • denote the longest 
corresponding £k query-response pair. Define Ki := C[ ® M[ for 1 < i < 
min {7, ra}. Notice that Ki corresponds to the keystream generated by Fk 
for 1 < i < £. For m <% < i we generate Ki by Fk> {Co + i). 

2. If there is no such ciphertext, then we generate Ki as Fk> (Co+i) for 1 < i < £. 

Then we set E c °(Ci • • • Cg) = {C\ © Ki, C 2 0 ^2 || • • • || Cg ® Kg) . 

Proposition 2. Let D be a PA1 adversary for RIV-CTR making queries whose 
lengths in number of blocks sum up to a, then 



where Di is an adversary which may not make the same query to both of its 
oracles, and makes a total of a queries with the same running time as D. 

We refer to a proof of this proposition to the full version of the paper [3] . Here, 
we also describe and analyze an extractor for the CBC mode. 

In the following subsections we discuss ways of achieving PA1 assuming a 
nonce and arbitrary IV. Our basic building block will be a random IV PA1 
scheme. 

5.3 PA1 Nonce IV Schemes 

Nonce IV schemes are not necessarily PA1 in general. For example, CTR mode 
with a nonce IV is not PA1 and in [3j we show that IND-CPA is distinct from 
PA1. Furthermore, coming up with a generic technique which transforms nonce 
IV schemes into PA1 schemes in an efficient manner is most likely not possible. 

If we assume that the nonce IV scheme, when used as a random IV scheme, 
is PA1, then there is an efficient way of making the nonce IV scheme PA1. Note 
that we already have an example of a scheme satisfying our assumption: nonce 
IV CTR mode is not PA1, but RIV-CTR is. 

Nonce Decoy. The nonce decoy method creates a random- looking IV from the 
nonce IV and forces the decryption algorithm to use the newly generated IV. 
Note that we are not only transforming the nonce into a random nonce: the 
solution depends entirely on the fact that the decryption algorithm does not 
recompute the newly generated IV from the nonce IV. 

Let LI = (£,£>, V) be a nonce-IV-based AE scheme. For simplicity assume 
IV := {0, l} n , so that I Vs are of a fixed length n. We prepare a pseudo-random 
function Gk' -IV — >• IV with an independent key K ' . We then construct an AE 
scheme 77 = (£,D, V) as follows. 


Tv «- G K r (I V) 
(C,T)^£ i k v ’ a (M) 


V^ACX): 

IV\\C <r-C 
M^V™' A (C, T) 



IV* 4-Gk, (IV) 
IV\\C<^C 


return (C, T) 


C^IV\\C 


return M 


b <- qyyc, 

return (IV* = IV and b = T)?T : _L 
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Note that the decryption algorithm V does not make use of K' or IV. If the 
decryption algorithm recomputes IV using K' and 7U, then 77 will not be PA1. 
Furthermore, one can combine V and V in order to create a scheme which rejects 
ciphertexts when the TV it receives does not come from an encryption query. 

The condition that 77 with random IVs be PA1 is necessary and sufficient 
in order for 77 to be PA1, assuming G is a PRF; see [3] for the proof of this 
statement. In Sect. 16.21 we discuss what the nonce decoy does for INT-RUP. 


5.4 PA1 Arbitrary IV Schemes 


PRF-to-IV. Using a technique similar to MAC-then-Encrypt [9], we can turn 
a random IV PA1 scheme into an arbitrary IV PA1 scheme. 

The idea behind the PRF-to-IV method is to evaluate a VIL PRF over the 
input to the scheme and then to use the resulting output as an IV for the random 
IV encryption scheme. Let 77 = {£ , D, V) be a random IV PA1 scheme taking 
IVs from {0, l} n , and let G : {0, l} k x {0, 1}* -> {0, l} n be a VIL PRF. 


pIV,A 

C K,K' 


(M): 


v Zk’ ( c ’ iv ll T ) : v k.k' iv ll T ) : 


7U G k \IV\\A\\M) 
return (C,IV\\T) 


M <- V^’ A (C, T) M <- V%’ A (C, IV\\T) 

IV* +-Fk>(IV\\A\\M) 
return M &^v£ v ’ A (C,T) 

return (IV « 7U* and b = T)?T : JL 


The PRF-to-IV method is more robust than the nonce decoy since V really only 
can use IV to decrypt properly. 

The condition that 77 with random IVs be PA1 is necessary and sufficient in 
order for 77 to be PA1, assuming G is a VIL-PRF; see [3] for the proof of this 
statement. Note that the PRF-to-IV method is the basic structure behind SIV, 
BTM, and HBS. We show that the PRF-to-IV method is INT-RUP in Sect 16. 21 

Online Encryption. Since the PRF needs to be computed over the entire 
message before the message is encrypted again, the PRF-to-IV method does 
not allow for online encryption. Recall that an encryption scheme has online 
encryption if for all (77, 7V, A), the resulting function is online. Examples of 
such schemes include COPA and McOE-G. 

If we want encryption and decryption to both be online in the arbitrary IV 
setting, then a large amount of ciphertext expansion is necessary, otherwise a 
distinguisher similar to the one used in the proof of Prop, [l] can be created. 

An encryption scheme 77 = {£ , V) is online if for some n there exist functions 
fi and f[ such that 


£k(M) = / n (Mi) /2n(M 1 M 2 ) . • • f jn (M 1 M 2 . . . Mj) f [ M , (M) , 

where j = [(\M\ — l)/nj and Mi is the ith n-bit block of M. If the encryption 
scheme has online decryption as well, then the decryption algorithm can start 
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decrypting each “block” of ciphertext, or 


V K (fn(M 1 ) f 2n {M 1 M 2 ) • • • • . . Mi)) = M 1 M 2 • • ■ , 


for all i < j. 

Proposition 3. Let II = (£,D) be an encryption scheme where £ is n-online 
for all K , IV, and A, and V is online as well, then there exists a PA 1- adversary 
D such that for all extractors E there exists an IND-CPA adversary Di such 
that 

l-CPA 77 (D 1 ) < PAlg(D) , 

where D makes one 0\ query, one 0 2 query, and Di is as efficient as D plus 
one query to E. 

Proof See [3]. □ 

Example 2. In certain scenarios, padding the plaintext is sufficient for PA1. Do- 
ing so makes schemes such as McOE-G secure in the sense of PA1, while keeping 
encryption and decryption online. The cost is a substantial expansion of the ci- 
phertext. For the case of McOE-G, the length of the ciphertext becomes roughly 
twice the size of its plaintext. 

It is important to note that McOE-G is based on an n-bit block cipher, and 
each n-bit message block is encrypted (after it is XORed with some state val- 
ues) via the block cipher call. Since the underlying block cipher is assumed 
to be a strong pseudo-random function (SPRP), we can pad a message M = 
M\M 2 • • • Mg (each Mi is an n/2-bit string) as 0 n / 2 Mi || 0 n / 2 M 2 || • • • || 0 n / 2 M^ 
and then encrypt this padded message using McOE-G. So each block cipher call 
processes 0 n / 2 M^ for some i. This “encode-then-encipher” scheme m is PA1 as 
shown in [3]. 

Example 3. If we do not require the decryption to be online, then we can achieve 
PA1 without significant ciphertext expansion. An example of a scheme that falls 
into this category is the recently-introduced APE mode [2 j , whose decryption is 
backward (and hence not online). A proof of this is given in [3]. 


5.5 PA2 Schemes 

Most AE schemes are proven to be IND-CPA and INT-CTXT, which allows one 
to achieve IND-CCA [9] assuming verification works correctly. In order to be 
as efficient as possible, the underlying encryption schemes in the AE schemes 
are designed to only achieve IND-CPA and not IND-CCA, since achieving IND- 
CCA for encryption usually requires significantly more operations. For example, 
GCM, SIV, BTM, and HBS all use CTR mode for encryption, yet CTR mode is 
not IND-CCA. Since IND-CPA+PA2 is equivalent to IND-CCA', none of these 
schemes achieve PA2. 

A scheme such as APE also cannot achieve IND-CCA' because its decryption 
is online “in reverse”. If (£k,Pk) denotes APE, then an adversary can query 


How to Securely Release Unverified Plaintext in Authenticated Encryption 


121 


Ek{MiM 2 ) = CiC 2 and then Vk{C' 1 C 2 ), which equals M[M 2 . But if an ad- 
versary interacts with ($,%) (see Def. [3j), then Vk{G[C 2 ) will most likely not 
output M[M 2 . 

Existing designs which do achieve PA2 include those which are designed to 
be IND-CCA', such as the solutions presented by Bellare and Rogaway m, 
Desai [19], and Shrimpton and Terashima [36]. These solutions cannot be online, 
and they are usually at least “two-pass” , meaning the input is processed at least 
twice. 

6 Integrity in the INT-RUP Setting 

6.1 INT-RUP Attack 

Several AE schemes become insecure if unverified plaintext is released. In Propo- 
sition [4] we explain that OCB [33j and COPA [4] are not secure in the RUP 
setting. 

The strategy of our attack is similar to that of Bellare and Micciancio on the 
XHASH hash function [8]. However, our attack is an improved version that solves 
a system of linear equations in GF( 2) with only half the number of equations 
and variables. 

The attack works by first querying the encryption oracle under nonce N to 
get a valid ciphertext and tag pair. Then, two decryption queries are made under 
the same nonce N. Using the resulting plaintexts a system of linear equations is 
set up, which when solved will give the a forgery with high probability. A formal 
description of the attack is given in [3]. 

Proposition 4. For OCB and COPA , for all £ > n there exists an adversary 
A such that 

INT-RUPt7(A) >1 — 2”-* , 

where A makes one encryption query and two decryption queries, each consisting 
of £ blocks of n bits. Then, the adversary solves a system of linear equations in 
GF( 2) with n equations and £ unknowns. 


6.2 Nonce Decoy and PRF-to-IV 

In Sect. [5]we introduced a way of turning a random IV PA1 scheme into a nonce 
IV PA1 scheme, the nonce decoy, and a way of turning a random IV PA1 scheme 
into an arbitrary IV PA1 scheme, the PRF-to-IV method. Here we consider what 
happens to INT-RUP when the two methods are applied. 

The nonce decoy adds some integrity to the underlying random IV PAl 
scheme. Using the notation from Sect. 15.31 17 needs to be a slightly lighter 
form of INT-RUP in order for 77 to be INT-RUP. Concretely, 77 only needs 
to be INT-RUP against adversaries which use IVs which are the result of an 
encryption query. Furthermore, this requirement on 77 is sufficient to prove that 
77 is INT-RUP. 


122 


E. Andreeva et al. 


Naturally if 77 is INT-RUP, then 77 is INT-RUP as well. In fact, if 77 is 
INT-RUP against adversaries which use IVs which are the result of an encryption 
query, then 77 is INT-RUP. These statements and their proofs can be found in [3]. 

The PRF-to-IV method is a much stronger transform than the nonce decoy. 
Following the notation from Sect. 15.41 we do not need to assume anything about 
the underlying random IV scheme 77 in order to prove that 77 is INT-RUP. 

7 Conclusions 

Many practical applications desire that an AE scheme can securely output plain- 
text before verification. We formalized security under the release of unverified 
plaintext (RUP) to adversaries by separating decryption and verification. 

Two symmetric-key notions of plaintext awareness (PA1 and PA2) were intro- 
duced. In the RUP setting, privacy is achieved as a combination of IND-CPA and 
PA1 or PA2. For integrity, we introduced the INT-RUP notion as an extension 
of INT-CTXT, where a forger may abuse unverified plaintext. We connected our 
notions of privacy and integrity in the RUP setting to existing security notions, 
and saw that the relations and separations depended on the IV type. 

The CTR and CBC modes with a random IV achieve IND-CPA+PAl, but this 
is non-trivial for nonce-based or deterministic encryption schemes. Our results 
showed that many AE schemes such as GCM, CCM, COPA, and McOE-G are 
not secure in the RUP setting. We provided remedies for both nonce-based and 
deterministic AE schemes. For the former case, we introduced the nonce decoy 
technique, which allowed to transform a nonce to a random-looking IV. The 
PRF-to-IV method converts random IV PA1 schemes into arbitrary IV PAl 
schemes. We showed that deterministic AE schemes cannot be PAl, unless the 
decryption is offline (as in APE) or there is significant ciphertext expansion. 

Future Work. Given that our PRF-to-IV method is rather inefficient, we leave 
it as an open problem to efficiently modify any encryption-only scheme into an 
AE scheme that is INT-RUP. A related problem is to fix OCB and COPA to 
be INT-RUP in an efficient way. The PAl solutions we provide all start with 
the assumption that the nonce IV or arbitrary IV scheme is PAl when a ran- 
dom IV is used instead. An interesting problem is to find alternative solutions 
to constructing nonce IV and arbitrary IV PAl schemes. A problem of theo- 
retical interest is to find a non-pathological random IV encryption scheme that 
is not PAl. In some applications, formalizing security in the RUP setting as 
IND-CPA+PAl and INT-RUP may be sufficient. It is interesting to investi- 
gate how well this formalization reflects the problems encountered in real-world 
implementations, to see where PA2 may also be necessary, and how blockwise 
adaptive adversaries [25] play a role in the RUP setting. Finally, our paper does 
not address the behavior of applications which use unverified plaintext. A fur- 
ther understanding of the security risks involved in using unverified plaintext in 
applications is necessary. 
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Abstract. In FSE 2014, an authenticated encryption mode COBRA 4], 
based on pseudorandom permutation (PRP) blockcipher, and POET [3|, 
based on Almost XOR-Universal (AXU) hash and strong pseudorandom 
permutation (SPRP), were proposed. Few weeks later, COBRA mode 
and a simple variant of the original proposal of POET (due to a forging 
attack [13( on the original proposal) with AES as an underlying block- 
cipher, were submitted to CAESAR, a competition d[ of authenticated 
encryption (AE). In this paper, we show a forging attack on the mode 
COBRA based on any n-bit blockcipher. Our attack on COBRA requires 
about 0(n) queries with success probability of about 1/2. This disproves 
the claim proved in the FSE 2014 paper. We also show both privacy and 
forging attack on the parallel version of POET, denoted POET-m. In case 
of the modes POET and POE (the underlying modes for encryption), we 
demonstrate a distinguishing attack making only one encryption query 
when we instantiate the underlying AXU hash function with some other 
AXU hash function, namely a uniform random involution. Thus, our re- 
sult violates the designer’s main claim (Theorem 8.1 in 0) . However, the 
attacks can not be extended to the specifications of POET submitted to 
the CAESAR competition. 

Keywords: Authenticated Encryption, COBRA, POET, Distinguishing 
Attack and Forging Attack. 


1 Introduction 

The common application of cryptography is to implement a secure channel be- 
tween two or more users and then to exchange information over that channel. 
These users can initially set up their one-time shared key. Otherwise, a typi- 
cal implementation first calls a key-exchange protocol for establishing a shared 
key or a session key (used only for the current session). Once the users have 
a shared key, either through the initial key set-up or key-exchange, they use 
this key to authenticate and encrypt the transmitted information using efficient 
symmetric-key algorithms such as a message authentication code Mac(-), pseu- 
dorandom function Prf(-) and (possibly tweakable symmetric-key) encryption 
Enc(-) respectively. The encryption Enc provides privacy or confidentiality of the 
plaintext M. The message authentication code Mac and pseudorandom function 
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Prf provide data-integrity authenticating the transmitted message (M, A), a pair 
of plaintext M and an associated data A. Mac also provides user-authenticity 
(protecting from impersonation). An Authenticated Encryption scheme (or 
simply AE) serves both of the purposes in an integrated manner. An authenti- 
cated encryption scheme AE has two functionalities one of which, called tagged 
encryption, essentially combines message authentication code and encryption, 
and the other combines verification and decryption algorithms. 

1. Tagged Encryption AE.enc/ci On an input message M from a message 
space M C {0, 1}* and an associated data A from an associated data space 
V C {0, 1}*, it returns a tagged ciphertextQ Z G {0,1}*. 

2. Verified Decryption AE. dec/-: On an input tagged ciphertext Z and an 
associated data A, it returns a plaintext M when Z = AE.enc /~(M, A), called 
valid tagged ciphertext. Otherwise, for all invalid tagged ciphertext Z, it 
returns a special symbol _L. 

Note that both algorithms take the shared key k from a keys-space JC = {0, l} Lkey 
where L^ ey denotes the key-size. The key includes keys for an underlying block- 
cipher, masking keys etc. Some constructions derive more keys by invoking the 
blockcipher with different constant inputs. 

Privacy and Authenticity Advantage. Informally speaking, an AE scheme 
is said to have privacy if the tagged ciphertext behave like a uniform random 
string for any adaptively chosen plaintext. More formally, let A be an oracle 
adversary which can make queries to AE.enc adaptively. Let $ be a random 
oracle which returns a uniform random string for every new query. We define 
the privacy advantage of A against AE to be 

Ad V P r E iv O := |Pr[Y E - enc * = 1] - Pr [A $ = 1]| 

where the two probabilities are taken under $, K (usually chosen uniformly from 
the key-space K) and the random coins of A. Similarly, we define the authenticity 
advantage of A as 

Adv^ th (A) := Pr[A AE ' enCK = Z, AE.dec K (Z) ± J_ and Z is fresh] 

where the probability is taken under K and the random coins of A. By fresh, we 
mean that Z is not a response of an encryption query of A. 


1.1 Two AE Schemes COBRA and POET Submitted to CAESAR 

CAESAR [l] is an ongoing competition for authenticated encryption schemes. 
The final goal of the competition is to identify a portfolio of different authenti- 
cated encryption schemes depending on different applications and environments. 
Fifty seven schemes have been submitted. AES-COBRA and POET are two such 


1 A tagged ciphertext usually consists of a ciphertext and tag. However, there may 
not exist a clear separation between ciphertext and tag. 
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submissions. Variants of these two schemes have been published before in FSE 
2014. In 0 Guo et al. demonstrated a forging attack against POET making 
only one encryption query. So the designers of POET modified it accordingly to 
resist this forging attack and submitted the revised version to CAESAR. 


1.2 Our Contribution 

In this paper, we investigate the resistance of the two authenticated encryption 
schemes COBRA and POET against forging and privacy attacks. The paper is 
essentially divided in two sections: Section 4 describes the forging attack for 
COBRA and Section 5 describes forging and privacy analysis on the POET- 
mode and its parallel variant, called POET-m. 

1. Attack on COBRA. In this paper, we show a forging attack on the submit- 
ted version of AES-COBRA. In fact, the attack works for the mode COBRA 
based on any blockcipher. Thus it disproves the claim stated in [4]. The au- 
thenticity advantage of our proposed algorithm is about 1/2 and it makes 
about 2 n encryption queries where n is the plaintext size of the underlying 
blockcipher. 

2. Analysis of POET and POET -m. The designers of POET have recom- 
mended a parallel version, denoted POET-m. We provide distinguishing and 
forging attacks on it. Moreover, the designers claimed security of POET for 
an arbitrary AXU or almost XOR universal hash function (the formal defini- 
tion of AXU hash function is given in Section 2). Here we disprove their claim 
by showing a distinguishing attack on a special choice of AXU, namely a uni- 
form random involution. Thus, the security proof of the claims have flaws. 
We also extend this to a forging attack. All these attack algorithms make 
very few encryption queries and succeed with probability close to one. 

We would like to note that while the COBRA is affected by our attack, the 
instantiation of the POET candidate which uses specific AXU hash functions is 
not affected. 

2 Basics of Almost XOR Universal (AXU) Hash 

2.1 Notation and Basics 

In this paper, we fix a positive integer n which denotes the block size of the 
underlying blockcipher. We mostly use AES (advanced encryption standard) 0 
with 128 bit key size as the underlying blockcipher and in this case n = 128. 

Binary Field. We identify the set {0, l} n as the binary field of size 2 n . An n bit 
string <a = ooaq...o n _i, G {0,1} can be equivalently viewed as a polynomial 
a(x) = ao + ol\x + ••• + Oi n -\x n ~ x . For notational simplicity, we write the 
concatenation of two binary strings a and /3 as a/3. The field addition between 
two n bit strings is bit-wise addition ® (we also use “+”). Let us fix a primitive 
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polynomial p(x) of degree n. Field multiplication between two n-bit strings a and 
/3 can be defined as the binary string corresponding to the polynomial a(x)/3(x) 
modp(x). We denote the multiplication of a and /3 as a • /3. Thus, the zero 
polynomial 0 and the constant 1 polynomial are the additive and multiplicative 
identity respectively. Moreover, x is a primitive element since the polynomial 
p(x) is primitive. 

2.2 Almost XOR Universal (AXU) Hash 

Universal hash functions and its close variants strongly universal , AXU-hash |§, 
EH 0 [HyiH are information theoretic notions which are used as building blocks 
of several cryptographic constructions, e.g., message authentication code dil, 
domain extension of pseudorandom function E] , extractor [17] , quasi- 
randomness and other combinatorial objects [ii m. 

Definition 1 (AXU Hash Function). A function family Fl : M -A {0, l} n 

indexed by L E C is called e-AXU M if for all x ^ x' E AA and S E {0, l} n , 
Pr L [F L (x) ® Fl(x') = S] < e where L is chosen uniformly from C. 

Examples. Field Multiplier. Let L E {0, l} n be chosen uniformly then 
F l (x) = L x (field multiplication on {0, l} n ) is 2 _n -AXU. 

Polynomial Hash. Polynomial hash [20] is one of the popular universal hash 
which can be computed efficiently by Horner’s rule 0 (same as computation 
of CBC message authenticated code [2, 6]). 

Definition 2. f^Zj / We define the polynomial-hash indexed by L E {0, l} n over 
the domain ({0, l} n ) + := U^IO, l} m as 

poly L (a d , a d _i , . . . ,a 0 ) = a 0 + cn ■ L 1- a d - 1 • L d ~ l +a d -L d 

where ao, ai, . . . , E {0, l} n and L l denotes L • L • • • L (i times). 

It is easy to see that the function mapping (ai, . . . , ad) to a\ -L-\ is 

^-AXU hash function over the domain ({0, l} n ) d . To see this, let (ai, . . . , ad) ^ 
(&i, • • • , bd) and c E {0, l} n . So, 

c + (ai — 6i) • L + • • • + (a^ — bd) • L d 

is a non-zero polynomial and hence it has at most d distinct roots of L. 

Four- Round AES. The AES (for 128 bit keys) has ten rounds. However, it has 
been shown that four-round AES has good differential properties. More formally, 
Daemen et al. in 0 showed that four-round AES is a family of 2 113 -AXU under 
the simplified assumption that all four round keys are uniform and independent. 


Uniform Random Involution. The uniform random function from {0, l} n 
to itself is an 2 -n -AXU hash function. A function / : {0, l} n -A {0, l} n is 
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called an involution if / is inverse of itself (so it must be permutation). Let 
I n denote a random involution whose responses are defined according to the 
following procedure: After responding to every query, it updates two sets: the 
set of all queries D and the set of all responses R. On a query x 0 D U /£, it 
returns an element chosen uniformly from the set {0, l} n \ (DUR). If x E D then 
it returns the previous response corresponding to x. Similarly, if x E R then it 
returns the previous query y E D for which the response was x. 

Lemma 1. The uniform random involution I n (as defined above) is an 2n 2 _ 2 - 
AXU hash function. 

Proof. Let x x' E {0, l} n and S E {0, l} n . Let us assume that x ® x' j ^5. 
By conditioning I n (x) = y , we must have I n (x') = y ® 8 which happens with 
probability at most l/(2 n — 2). Note that if y = x' or y = x ® S then the 
probability is zero. So P r[I n (x) ® / n (x') = S\ < 2n 1 _ 2 . Now assume x ® x' = S. 
So P r[I n (x) = x'] < 2n 1 _ 2 . When I n (x) x ' , by similar argument as before 
we also have differential probability bounded above by 2rt 1 _ 2 . This proves the 
Lemma. □ 

2.3 Combination of AXU Hash Functions 

Compositions of AXU Hash Functions. Now we show that property of 
being an AXU-hash function does not preserve under composition with same 
key. In other words, there exists an e-AXU Fl for a “small” e such that Fl o Fl 
is not even £-AXU for any S < 1. In particular, if we choose Fl to be a uniform 
random involution then Fl is 2n 1 _ 2 -AXU whereas the composition Fl o Fl is 
the identity function. Trivially, a similar result holds if we apply the CBC mode 
for a uniform random involution I n . The CBC mode applied to a function / is 
defined as follows: 

CBC f (x i, . ..,x d )= y d , where = /(//,•. rJ ®x t ), 1 <i<d 

and y 0 = 0 n . So when d = 2 and x 2 = 0, CF>C In (x i,0) = / n (/ n (aq)) = x\ and 
so CBC 1 ™ is not 5 - AXU for any S < 1. However, it is true for some specific 
choices of Fl, e.g. when Fl is field multiplier. In this case, CBC Fl is nothing 
but the poly-hash which has been shown to be d/2 n -AXU (see the paragraph 
immediately after Definition 2). 

Sum of AXU Hash Functions. Now we consider another method of domain 
extension of AXU hash function. Given an e-AXU Fl , we define the sum hash 

F£ um (x!, . . . , Xl ) = F l (xi) © • • • © F l (xi). 

Note that if Fl is linear (which is true for the field multiplier) then the sum hash 
can be simplified as Fffi m (x 1 , ...,xfi) = Fl{x\ ® . . . ® xfi) for which a collision 
can be found easily. So it can not be £-AXU for any S < 1. However, this does 
not work when we consider a uniform random function or involution and we 
concatenate a counter to message blocks. 
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3 Description of COBRA 

COBRA is an authenticated encryption mode based on blockcipher. It was origi- 
nally published in FSE 2014 Later the same mode with AES as the underlying 
blockcipher, called AES-COBRA, was submitted to CAESAR [l[. The mode can 
be viewed as hash then ECB (or electronic code-book) type encryption where 
hash function is poly-hash and ECB is applied on a double block, i.e., 2 n bit 
plaintext. The double block encryption is defined by two-round Feistel struc- 
ture As it uses Feistel structure, it is inverse- free. In other words, even 

though it is based on AES blockcipher, the decryption of COBRA does not re- 
quire AES decryption. 



Fig. 3.1. COBRA Modes for ciphertext and tag generation for three double blocks 
message. U is obtained from associated data, N is nonce and L is the hash key. 


3.1 Encryption Mode for COBRA 

COBRA is defined for any messages of size at least n bits. Now we briefly describe 
how the encryption algorithm of COBRA works for all inputs M E ({0, l} 2n ) + . 
In addition to a message M, it also takes a nonce N E {0, l} n and an associated 
data A, and outputs a tagged ciphertext (C, T ) where \C\ = \M\ and T E {0, l} n . 
Readers are referred to [1, 4!] for complete description of the algorithm (i.e., how 
it behaves for other input sizes). We write M = M\\\ • • • || M& for some posi- 
tive integer d where Mi, . . . , M& E {0, l} 2n . We also write Mi = (M^[ 1], Mi[2]) 
where Mi[l], Mi[2] E {0, l} n are also called blocks and Mi s are called double 
blocks. Let /V and y^’s be independent uniform random (or pseudorandom) 
permutations over |0, l} n for all i > 1. We describe the COBRA-mode based on 
these permutations l| It uses the two-round Feistel structure which is defined as 
follows: 

LR,(X[1],X[2]) = (y[l],y[2]), X[1],X[2] E {0,1}" 

2 The 3 and 4 rounds security analysis is given in 0 (see 0 for characterization of 
Luby-Rackoff constructions). 

3 These are actually derived from a single blockcipher using the standard masking 
algorithm (i.e., XEX construction 0). 
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where 

1. Y[l] =X[1]® Pi(X[2]) and 

2. Y[ 2]=X[2]© 7i (X[l]). 

It is easy to see that it is invertible and the inverse function 2LR i “ 1 (F[l], F[2]) = 
(X[1],X[2]) where X[2] = 7i (Y[l]) © F[2] and X [1] = [2]) © Y[l]. 


Algorithm: COBRA Encryption 

Input: (Mi[l], M l [2], . . . , M d [l], M d [2]) € ({0, l} n ) 2d , N € {0, l} n 
Output: (Ci, C 2 , Co) € ({0, l} 2 -) d 


1 for i = 1 to d 

2 P i [l]=pdy L (l,N,M 1 [l],M 1 [2],...,M i [l])-, 

3 Pi [2] = po\y L (l,N,M 1 [l],M 1 [2],...,M i [l],M i [2])- 

4 Ci = LRi(Pi[l],Pi[2]); 

5 end for loop 

6 Return (Ci, C 2 , ..., Cd) 

Algorithm 1. COBRA encryption algorithm for a nonce N E {0, l} n , and a messages 
M of sizes multiple of 2 n. Note that the associated data has no influence on the 
ciphertext. It is used for computing the tag. 


3.2 Tag Generation and Verified Decryption Algorithm 

The final tag T is computed from nonce N and U (depends only on the associated 
data A) and 

d 

S : = ®( p 4l] © [2] © C<[1] © Ci[ 2]). 

i= 1 

We simply denote the tag by T(7V, E7, S ). One can find the details of the construc- 
tion of T in [H, 0] . The verified decryption algorithm takes a tagged ciphertext 
(Ci, . . . , Cd, T) where Ci s are double blocks and T E {0, l} n . It works as follows: 

1. It first computes Pi = LR i _1 (Ci), 1 < i < d. 

2. It returns T if T ^ T(N, U, S ). 

3. Else it returns (Mi, . . . , M^) where M*[ 2] = L • Pi[l\ ® Pi[ 2] and Mj[l] = 
L.P,_i[2]®P,[l], l<i<d. 
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4 Forging Attack on COBRA 

We first state the following fact which plays key role in our forging attack. 

Fact 1.0 Let h £ {0, l} n be a fixed element and h\, h \, . . . , h$, h\ be chosen 
uniformly from {0, l} n . Then, the probability that there exists bi, ... ,b s £ {0, 1} 
such that ®,K,=h is at least 1 — 2 n s . Furthermore, the sequence bi, ... ,b s 
can be efficiently computed. 

Key Idea of the Forging Attack. Now we describe the main idea of our 
forging attack. Our attack fixes nonce and associated data and so we simply 
denote the tag T{N,U, S) by T(S). Suppose M° is an encryption query with 
T° := T(S°) as a tag where S° denotes the P-value for the message M°. Suppose 
Cf and C} are the two i th double-block ciphertexts for two different queries, 
1 < i < s. By Fact 1, we can find b \, . . . , b s such that ®f [0] ® [1]) = S°. 

So if we can choose messages such that ®| =1 (P i 6i [0] ® [1]) = 0 n happens with 

high probability then (Cf 1 2 3 4 5 6 7 8 , . . . , C h s s , T°) is a valid tagged ciphertext. As poly- 
hash is linear, we can ensure that ®f = i(P^[0] ® P^[ 0]) = 0 n holds with high 
probability for suitably chosen queries. 


Forging Algorithm Tq. Now let us fix a positive integer i whose exact value 
will be determined later. We define the following messages 

M i := ((0, 0) i_1 , (0, 1), (0, 1 <i<£. 

Let M° be the all zero block message. Our forging algorithm makes £ + 1 many 
queries, namely AP’s. 

Forging Algorithm J~o for COBRA. 

1. It makes encryption queries M l and obtains responses ( C l ,T l ), 0 < i < i. 

2. Let C° m (Cf [1] , Cf [2] , • ■ ■ , C?[l], C?[2]) and ft* * [1] © [2] . 

3. For i = 1 to i 

let c* = (Ciilicm, ■ • • ,C}[1],C}[2]) and h\ = Cj[ 1] ® Cj[ 2], 

4. Let h = h q ® (©f_i hf) (the sum of the ciphertext blocks for M°). 

5. Based on Fact 1, it finds a sequence . . . , be - 1 £ {0, 1}, h\. = h\®h. 

6. If there is no such sequence then it aborts else it proceeds. 

7. If bi ® • • • ® be - 1 7 ^ 1 then it aborts. 

8. Else it makes the forgery (C* := {Cl , . . . , C%),T°) where for all 1 < i < i— 1 

c * = {cm\cm ^ ^ = i, 

i \C?[1]||C?[2] if bi = 0. 
and C*[£[ = C}[l]\\Ci[2]. 
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Now we compute the success probability of the forging attack. The forging 
algorithm aborts in two cases. We show that the abort probabilities are small. 
Moreover, given that it does not abort, we also show that the forging attack 
works perfectly. 

Theorem 1. The forgery algorithm To has success probability at least \ x (1 — 
2 -n ) when we set £ = 2 n. 

Proof. In the ideal case, h l 0 and h\ are independently and (almost) uniformly 
drawn from {0, l} n as these are xor of two blocks of the i th double-block ci- 
phertext for fresh queries TIP and M° respectively. Note that TIP and M° have 
different double block values in the ith position. By Fact 1, with probability at 
least 1/2, we can efficiently find foi , . . . , such that (BjZ\h J b = ft 0 h{. 

Claim. Let us assume that we have found such b\. . . . , b^% G {0,1} which 
happens with probability at least 1 — 2 n ~ i . Then, 

b\ 0 • • • ® bn - 1 = 1 => (C*, T) is a valid ciphertext tag pair. 


We first note that (C*, T) is a fresh tagged ciphertext as the last double block 
of ciphertext is different from those of all other tagged ciphertexts. To prove 
that tagged ciphertext is valid, we first compute S* and S° for the given forged 
ciphertext and M° respectively where S° denotes the S values for the message 
M°. 


Computation of S' 0 . Computation of S° is straightforward from its definition. 
5° := 1] ® ^[2])) © (®ii(C?[l] © C?[2]). 


Now note that P 7 °[ll = polyr (1, TV, 0 2z 2 ,0) and Pk \ 21 = polyr (1, TV, 0 2z 1 ,0). 
Let E = ©t 1 (poly L (l, TV, O 2 ^ 1 , 0) © poly L (l, TV, O 2 *- 2 , 0). So 


i 

S° = h © (0(poly L (l, N, 0 2i -\ 0) © poly L (l, N, 0 2i ~ 2 , 0)) = h © S. 

i= 1 


Computation of S*. Now we compute S* under the assumption that the first 
abort does not hold, i.e., we have found b \ y . ... . , bi-\ such that (BjZ\h J b = h(&h{. 

Note that the xor of ciphertext blocks which is equal to {&jZ.\h 3 b 0 h\ = h. 

Now we decrypt the forged ciphertext double blocks by applying 2LR -1 . Let 
P* := (P*[l], P*[2]) be the T th double block of forged ciphertext after we apply 
Luby-Rackoff two round decryption. Similarly, we denote Pi values for TIP query 
as P/ . As all ciphertext double blocks C* have appeared in responses of queries 
at the same position, the P* values, 1 < i < £ are given as below. 


i-[l]||i-[2] if bi = 1, 
J?[1]||J?[2] if bi = 0. 
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and P; = P/[l]||P/[2], Note that 

1. P<[1] = poly L (l, N, 0 2 * -1 ) and P*[2] = poly^ (1 , N, 0 2i_1 l), 

2. /?[ 1] = poly L (l, N , 0 2i_1 ) and J?[ 2] = polyol, N, 0 2i ). 

By linearity of poly L , we can simply write for 1 < i < £ — 1, 

1. P*[l] © P*[2] = poly L (l, N, 0 2i_1 ) © poly (1 , N, 0 2i ) © bi and 

2. P;[ 1 ]©P;[ 2 ] =poly i (l,Ar,0 2£ - 1 )®poly i (l,Af,0 2 0©l. 

So ©5 = i(P/[l] © P*[2}) » 2 © 1 © and 

S* = 

Now if the second abort does not hold then (i.e., Qdjbj = 1) we have S* = h(&£ = 
S°. This proves the claim. 

Probabilities of the Abort Events. Now we informally argue that the 
second abort probability is P r[($ £ jZ{bj = 1] = 1/2. Note that C\[ 1], C\[ 2], C®[ 1], 
C?[2]’s are independent and so are /ig, h\ for all 1 < i < ^. Thus by conditioning 
/ig, choices of 6^s are independent and uniform. So the probability is 1/2. By 
Fact 1, the first abort does not hold with probability 1 — 2 n ~ i and now we claim 
the second abort does not hold with probability 1/2. Hence success probability 
of forging is at least |(1 — 2 n ~ £ ) which is almost 1/2 if we set £ = 2 n. □ 

Remark 1. Note that in the above attack, we can verify whether the forged 
ciphertext tag pair is valid without querying it. So we can repeat this process 
n times (we choose bit 0 or 1 in different position instead of the last bit as 
described above) to succeed with probability of about 1 — 2~ n . 

Remark 2. In the above analysis we make several probabilistic assumptions to 
make the analysis clean and simple. Here we list these. 

1. We assume that hy s are independent and uniform. However, for a fixed 
i, h\ and h l 0 are not completely independent as these are generated from 
ideal online random permutation. However, for these 4 n outputs C\ [1] , C\ [2] , 
C/[l], C/ [2] these are statistically very close to the uniform distribution with 
distance about ( 4 2 n )/2 n . 

2. True distributions of b £ s may not be uniform and independent. It actually 
depends on how we define bi s as there could be more than one choice of b{ s. 
However, all of these choices would lead to abort with probability of about 
1/2 or less. 

5 Security Analysis of POET and POET-m 

In this section we analyze POET and its parallel variant POET-m for some 
positive integer m. 
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Algorithm: POET-m Encryption 
Input: £ ({0, 1} TI ) £ 

Output: (Ci,Ca,...,C t ,T) € ({0, l}™)^ * 1 


1 for i — 1 to i — 1 

2 Xi — t © F L to P ( Mi © L ± ) © F L to P (M2 © L2) © • • • © F L to P (Mi © Li ) . 

3 Yi = E K ( Xiy , 

4 Ci = F Lbo t ( Yi - 1 © Vi) © Li - 

5 end for loop 

6 Xi — F L to P (A £ _i) © Mi. 

7 = 

8 Ci — F L bot (Yi - 1 © W); 

9 Ae +1 = El top (Al) © S © t. 

10 Y£+i — Ex(Xi + 1 ); 

11 T = F L bot (Yi) © Yg+i © S ; 

12 Return (Ci, C 2 , Cg,T) 


Algorithm 2. POET -m encryption algorithm for a messages M of sizes in with 

i < m. Let t be an n-bit element which is derived from associated data. The elements 
Li, . . . , Lm-i are derived keys and S' is a key derived from length of the message. 


Mi 




Mrr 


|| r“ 



Lm- 1 ' 


Im- 1 " 
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C m _, 




Om II T a 


Fig. 5.1. POET -m Mode for ciphertext and tag generation. Xo = Yq — r is obtained 
from the associated data. We denote F L t op and F L bot simply by Ft and Fb respectively. 
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5.1 POET -m and Its Security Analysis 

POET -mn. We first describe ciphertext generation algorithm of parallel version 
POET-m. We consider Fl to be the field multiplier hash in which message block 
is multiplied by the key L. We describe how POET-m works for all messages 
with £ < m. Let r be an n-bit element which is derived from 
associated data. The elements Li, . . . , L m _i are keys derived by invoking pseu- 
dorandom permutation on different constants (see Hi for details). Note that 
the input of the blockcipher X{ is a sum hash. When we instantiate the AXU 
by field multiplier we can simplify the sum hash (due to linearity). We have 

Xi = r@ L top • (Mi © ... © Mi) © L' 

where L' is the remaining part depending only on keys. We use this expression 
to mount the attack. 

Privacy Attack on POET-m. We first demonstrate a distinguishing attack on 
POET -m distinguishing it from uniform random online cipher when m > 4. We 
make two queries 

1. M = (Mi, M 2 , M 3 , M 4 ) and 

2 . M' = (M [ , M', M' := M 3 , M' A ) such that M x ± M[ and Mi®M 2 = M{®M' . 

We denote the corresponding internal variables by X, C : s and X', C n s. It is easy 
to see that X 2 = X 2 and X 3 = X 3 and hence C 3 = C 3 with probability one. 
This equality of third ciphertext block happens with probability 2 -n for uniform 
random online cipher. So we have a distinguisher which succeeds with probability 
almost one. The presence of fourth block makes sure that X^s are defined as 
above (as the final block is processed differently). We can keep all other inputs, 
for example nonce, associated data etc., the same. 

Forging Attack on POET-m. Now we see how we can exploit the above 
weakness in sum of AXU hash to mount a forging attack on the construction. 
We can forge when the number of message blocks is less than m and the last 
block is complete (as described in Algorithm 2 ). We first simply describe how 
the decryption algorithm works. Assume m > 3 and let Ci, C 2 , C 3 , T be an input 
for decryption where C^,T E {0, l} n . We note the following observations: 

1 . Yi depends on C\ ® • • • ® Ci for i < 3. 

2. Verification algorithm depends on X 3 , V 3 , T and some fixed values depending 
on associated data and key. 

We make one query M = (Mi, M 2 , M 3 ) and obtain the response (C, T) where 
C = (Ci,c 2 , C 2 ). Let a = {C [ , C£, C' 3 := C 3 ) ^ C such that Ci © C 2 = C[^C' 2 . 
We denote the corresponding internal variable by X, C°s and X', C n s. It is easy 
to see that by choices of C’ and the first observation V 3 = and so X 3 = X 3 . 
Again by the second observation, we see that verification algorithm depends on 
X 3 , V 3 , T (or X 3 , 1^, T') and some fixed information based on key and associated 
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data. So, whenever verification algorithm passes for Xs,Ys, it must pass for 
X 3 , Yg. Thus, , C' 2 , Cs,T) is a valid forge. 

Note that the above attack is a single query forging attack and hence it is 
also applicable to situations where nonce can not be reused. 

5.2 Security Analysis of POET mode 

POET: We now describe ciphertext generation algorithm of POET, i.e. POE the 
underlying encryption algorithm. In [l[, the following theorem (restated) was 
claimed: 


Mi M 2 
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Ct M II 


Fig. 5.2. POET Mode for ciphertext and tag generation. Xo = Yo = r is obtained 
from the associated data. In this figure, let Ft and Fb any independent e-AXU hash 
functions. 


Theorem 8.1 of POET Submission in [1]. Let E be a uniform random 
permutation, F t and F\ b be independent e-AXU hash functions. Then, for any 
privacy adversary A making at most q queries of a total length of at most a 
blocks, we have 


Adv pnv 

AaV POET f 


(A) < ecr 2 + 


2 n — cr 

Here we consider F t and F b to be any arbitrary AXU functions as men- 
tioned above in Theorem 8.1 of the submission POET in fl]. Given messages 
(Mi, . . . , Mi), we compute for 1 < i < i — 1 as follows: 


Ci = T 6 (U_!) © Yi, Yi = E K (Xi ), Xi = F t {X^ 1 ) © Mi 


where Xq = Yq = r. The last ciphertext block is computed differently and we do 
not need its description for our distinguishing attack. Note that Xi is computed 
by CBC f K If F t is a uniform random involution then as we have seen before, 
after applying CBC mode to F it does not remain (5- AXU for all S < 1 . We use 
this property to obtain a distinguisher. 
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Privacy Attack on POET with Uniform Random Involution F t . Now we 

demonstrate a privacy attack on POET distinguishing it from uniform random 
cipher when Fl is instantiated with uniform random involution. In this attack 
we only make a single query and so it is also nonce-respecting. This would violate 
Theorem 8.1 of the submission POET in [l| (online permutation security of POE). 
We believe that the theorem remains valid when Fl is instantiated with field 
multiplier (however, proof needs to be revised). The attack is described below. 

Claim. Pr[C2 = C4] = 1 where (Ci, C2, . . .) is the response of (Mi, 0 , 0, 0, • • • ) 
to POET with involution F t . 

We prove the claim by using the involution property of F. We can easily see 
that 

1. X 3 = F(F(X i)) = X 1 and 

2. similarly, X4 = X2 = F(X 1). 

So Y\ = Y3 and Y2 = Y4 and hence C2 = C 4. Note, we can choose any arbitrary 
nonce and associated data. This proves the claim. 

In an ideal case, we observe C2 = C4 with probability 2 -n . So the distinguisher 
of POET has advantage at least 1 — 2 -n . 

6 Conclusion 

In this paper, we demonstrate forging attack on COBRA with practical com- 
plexity. Hence the theorem proved in [J] is wrong. We also demonstrate forging 
and distinguishing attack on POET-ra for one particular recommended choice of 
AXU hash function. We also disprove the security claim for POET by presenting 
a distinguishing attack on a different choice of AXU hash function (not in the 
recommended list). However, these attacks on POET do not carry over to the 
versions submitted to CAESAR. 
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Abstract. So far, low probability differentials for the key schedule of 
block ciphers have been used as a straightforward proof of security against 
related-key differential analysis. To achieve resistance, it is believed that 
for cipher with k - bit key it suffices the upper bound on the probabil- 
ity to be 2~ k . Surprisingly, we show that this reasonable assumption is 
incorrect, and the probability should be (much) lower than 2~ k . Our 
counter example is a related-key differential analysis of the well estab- 
lished block cipher CLEFIA-128. We show that although the key sched- 
ule of CLEFIA-128 prevents differentials with a probability higher than 
2 -128 , the linear part of the key schedule that produces the round keys, 
and the Feistel structure of the cipher, allow to exploit particularly cho- 
sen differentials with a probability as low as 2 -128 . CLEFIA-128 has 2 14 
such differentials, which translate to 2 14 pairs of weak keys. The prob- 
ability of each differential is too low, but the weak keys have a special 
structure which allows with a divide-and-conquer approach to gain an 
advantage of 2 7 over generic analysis. We exploit the advantage and give 
a membership test for the weak-key class and provide analysis of the 
hashing modes. The proposed analysis has been tested with computer 
experiments on small-scale variants of CLEFIA-128. Our results do not 
threaten the practical use of CLEFIA. 

Keywords: CLEFIA, cryptanalysis, weak keys, CRYPTREC, differen- 
tials. 


1 Introduction 

CLEFIA [13] is a block cipher designed by Sony. It is advertised as a fast en- 
cryption algorithm in both software and hardware and it is claimed to be highly 
secure. The efficiency comes from the generalized Feistel structure and the byte 
orientation of the algorithm. The security is based on the novel technique called 
Diffusion Switching Mechanism, which increases resistance against linear and dif- 
ferential attacks, in both single and related- key models. These and several other 
attractive features of CLEFIA-128 have been widely recognized, and the cipher 
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has been submitted for standardization (and already standardized) by several 
bodies: CLEFIA was submitted to IETF (Internet Engineering Task Force) jT] , 
it is on the Candidate Recommended Ciphers LislQ of CRYPTREC (Japanese 
government standardization body), and it is one of the only twc0 lightweight 
block ciphers recommended by the ISO/IEC standard [8|. 

A significant body of analysis papers has been published on the round-reduced 
versions of CLEFIA [18I19I14I17I15I10I16I9I6] . all for the single-key model, but 
the analysis based on related keys is missing. Often this type of analysis can 
cover a higher number of rounds but requires the cipher to have a relatively sim- 
ple and almost linear key schedule. CLEFIA, however, has a highly non-linear 
key schedule, equivalent roughly to 2/3 of the state transformation and designed 
with an intention to make the cipher resistant against analysis based on related- 
key differentials. Using a widely accepted approach, the designers have proved 
that no such analysis could exist as the key schedule has only low probability 
(< 2 -128 for CLEFIA with 128-bit keys) differential characteristics. Note, we 
will not try to exploit the fact that some characteristics can be grouped into a 
differential that has a much higher probability than the individual characteris- 
tics. Our results go a step further and we show that key schedule differentials 
with a probability as low as 2 -128 , can still be used in analysis. This happens 
when they have a special structure, namely, the input/output differences of the 
differentials are not completely random, but belong to a set that, as in the case 
of CLEFIA- 128, is described with a linear relation. 

We exploit the special form of the key schedule: a large number of non-linear 
transformations at the beginning of the key schedule is followed by light linear 
transformations that are used to produce the round keys. In the submission paper 
of CLEFIA- 128, the proof of related- key security is based only on the non-linear 
part as this part guarantees that the probability of any output difference is 2“ 128 . 
In contrast, our analysis exploits the linear part and we show that there are 2 14 
of the above low probability differences which, when supplied to the linear part, 
produce a special type of iterative round key differences. CLEFIA- 128 is a Feistel 
cipher and, as shown in 5], iterative round key differences lead to an iterative 
differential characteristic in the state that holds with probability 1. Therefore 
we obtain related- key differentials with probability 1 in the state and 2“ 128 in 
the key schedule. The low probability (2 -128 ) of each of the 2 14 iterative round 
key differences means that for each of them there is only one pair of keys that 
produces such differences, or in total 2 14 pairs for all of them - these pairs form 
the weak- key class of the cipher. When we target each pair independently, we 
cannot exploit the differentials. However, the whole set of 2 14 pairs has a special 
structure and we can target independently two smaller sets of sizes 2 7 and thus 
obtain the advantage of 2 7 over generic analysis. As we will see in the paper, 
the special structure of the weak key class is due to the linear part of the key 
schedule, therefore we exploit the weakness of this part twice (the first time for 
producing iterative round key differences). 


1 This is the final stage of evaluation, before becoming CRYPTREC standard. 

2 The second one is PRESENT [7]. 



Low Probability Differentials and the Cryptanalysis of Full- Round CLEFIA-128 


143 


We further analyze the impact of the 2 14 pairs of keys and the advantage of 2 7 
that we gain over generic analysis. First we show that CLEFIA-128 instantiated 
with any pair of weak keys can be analyzed, namely we present a membership test 
for the weak class. Next, for the hashing mode of CLEFIA-128, i.e. when the cipher 
is used in single-block- length hash constructions, we show that differential multi- 
collisions [4] can be produced with a complexity lower than for an ideal cipher. 

The paper is organized as follows. We start with a description of 
CLEFIA-128 given in Section O We present the main results related to the anal- 
ysis of the key schedule and the production of the class of 2 14 pairs of weak- keys 
in Section [3] The differential membership test is given in Section [4j We present 
the analysis of the hashing mode of the cipher in Section [5] and in Section [6] we 
conclude the paper. 

2 Description of CLEFIA-128 

CLEFIA is a 128-bit cipher that supports 128, 192, and 256-bit keys. We analyze 
CLEFIA with 128-bit keys that is referred as CLEFIA-128. Before we define the 
cipher, we would like to make an important note. To simplify the presentation, 
we consider CLEFIA-128 without whitening keys0. Our analysis applies to the 
original CLEFIA-128 as shown in Appendix [B] We proceed now with a brief 
description of CLEFIA-128. It is an 18-round four-branch Feistel (see Fig. [3] of 
Appendix A) that updates two words per round. A definition of the state update 
function is irrelevant to our analysis (see m for a full description) and further 
we focus on the key schedule only. 

A 128-bit master key K is input to a 12-round Feistel GFA^ ? i 2 (with the same 
round function as the one in the state, refer to Fig. [3] of Appendix A) resulting 
in a 128-bit intermediate key L. All the 36 round key^fl RKi , i = 0, . . . , 35 are 
produced by applying a linear transformation to the master key K and the 
intermediate key L as shown below (® stands for the XOR operation and 1 1 is 
concatenation): 


RKqWRK^RK^R^ 

<r~ L 

0*5i , 

rk 4 \\rk 5 \\rk 6 \\rk 7 

<- S{L)®K 

0*?2, 

rk 8 \\rk 9 \\rk 10 \\rk u 

<- E 2 (L) 

0*53, 

rk 12 \\rk 13 \\rk 14 \\rk 15 

<- E 3 {L)®K 

0*54, 

rk 16 \\rk 17 \\rk 18 \\rk 19 


0*555 

rk 20 \\rk 21 \\rk 22 \\rk 23 

<- £ 5 {L)®K 

®*S6, 

rk 24 \\rk 25 \\rk 26 \\rk 27 

<- S 6 (L) 

0*57, 

rk 2S \\rk 29 \\rk 30 \\rk 31 

<- S 7 {L)®K 

0*58, 

rk 32 \\rk 33 \\rk 34 \\rk 35 

<- S 8 (L) 

0*5g, 


3 There are four whitening keys: two are added to the plaintext, and two to the 
ciphertext. 

4 Two round keys are used in every round, thus there are 2 • 18 = 36 keys in total. 
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where Si are predefined 128-bit constants, and E is a linear function defined fur- 
ther. In short, each four consecutive round keys RK \i, RK^+i, RK^i+ 2 , -Rif 4^+3 
are obtained by XOR of multiple applications of E to L, possibly the master key 
K , and the constant Si. The resulting 128-bit sequence is divided into four 32-bit 
words and each is assigned to one of the round key words. The linear function 
E (illustrated in Fig. [Tj) is a simple 128-bit permutation used for diffusion. The 
function E : {0, l} 128 — )> {0, l} 128 is defined as follows: 

Xi 28 — >* T128 

Y = X [120 - 64] X [6 - 0]X[127 - 121]X[63 - 7], 
where X[a — b] is a bit sequence from the a-th bit to the 6-th bit of X. 


128-bits 



Fig. 1 . The function E. The numbers denote the size of the bit sequence. 


We would like to make a note about the notations of XOR differences used 
throughout the paper. To emphasize that a difference is in the word X, we use 
AX, otherwise, if it irrelevant or clear from the context we use simply A. 

3 Weak Keys for CLEFIA-128 

In the related-key model, the security of a cipher is analyzed by comparing two 
encryption functions obtained by two unknown but related keys. Given a specific 
relatiorU between keys, if the pair of encryption functions differs from a pair of 
random permutations, then the cipher has a weakness and can be subject to 
related-key analysis. Sometimes the analysis is applicable only when the pairs 
of related keys belong a relatively small subset of all possible pairs of keys. The 
subset is called the weak-key class of the cipher and the number of pairs of keys 
is the size of the class. 

We will show that a weak- key class in CLEFIA-128 consists of pairs of keys 
(K,K = K 0 C\(D)), where D can take approximately 2 14 different 128-bit 
values, such that for any plaintext P , the following relation holds: 


E k (P) © E k (P 0 £ 2 (D)) = Cs(D), (1) 

5 Some relations are prohibited as they lead to trivial attacks, see [3. for details. 
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where £1, £2, £3 are linear functions defined below. The property can be seen as 
a related- key differential, with the difference C\(D) for the master key, jC2{D) 
for the plaintext and Cs(D) for the ciphertext. From Equation (pQ), it follows 
that once D is defined, the probability of the differential is precisely one. 

In the state of CLEFIA-128, the probability of a differential characteristic is 
one if in each Feistel round, there is no incoming difference to the non-linear 
round function. This happens when the differences in the state and in the round 
key cancel each other. Consequently, the input difference to the round function 
becomes zer<fl An illustration of the technique for four rounds of CLEFIA-128 
is given in Fig. [2j Notice that the input state difference at the beginning of the 
first round (Z\i, A 2, A 3, Af) is the same as the output difference after the fourth 
round, i.e. it is iterative with the period of 4 rounds. Therefore, we will obtain 
a differential characteristic with probability 1 (in the state) for the full-round 
CLEFIA-128 if we can produce 4 -round iterative round key differences. 


Ai 


A 2 


A3 


A4 




Fig. 2. Iterative related- key differential characteristic for 4 rounds of the CLEFIA-128 
that is true with probability 1. The symbols Ai, A 2 , A 3 , A 4 denote word differences. 


Each round of the state uses two round keys, thus the above 4 -round 
iterative characteristic requires the round key differences to have a period 
of 8, i.e. ARKi = ARKi+%. Moreover, an additional condition has to 


A similar idea is given in 5]- 
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hold. Note that in Fig. [2j the differences in the consecutive round keys are 
(Ai, A3, A2, A4, A3, Ai, A4, A2), that is among the 8 round key differences, the 
first four are different, while the remaining four are only permutations of the 
first. These two conditions can be summarized as follows: 

Condition 1 - For all i, it should hold ARKi = ARKi+g. 

Condition 2 - For all i divisible by 8, it should hold ARK{ — ARKi+ 5, 
ARKi+i = ARJKi+zi, ARKi+2 = ARK^ 7, ARK ^ 3 = ARK^q. This can 
be rewritten as (ARK^ 4, ARKi+ 5, ARK^q, ARK 1 + 7 ) = it (ARK i, ARK i+ 1 
, ARK i+ 2, ARK i+ 3), where 7 r is 4- word permutation (0, 1, 2, 3) — >> (1,0, 3, 2). 

Further we show how to find the set of differences for which the two conditions 
hold. 

Condition 1 . From the definition of the key schedule 

RK 8i+0 \\RK 8i+1 \\RK 8i+2 \\RK 8i+3 <- S 2i (L) © S 2i+1 

RK 8i+8 \\RK 8i+9 \\RK 8i+w \\RK 8i+11 <- E 2i+2 {L) 0 S 2i+3 , 

it follows that Condition 1 for the first 4 (out of 8) round key differences in an 
octet of round keys can be expressed as 

AL = E 2 (AL). (2) 

We will obtain the same equation if we consider the remaining 4 round key dif- 
ferences. To satisfy Condition 1, we have to find possible values for AL such that 
Equation ([2j) holds. This can be achieved easily as ([2j) is a system of 128 linear 
equations with 128 unknowns (refer to the definition of £), and has solutions of 
the form (expressed as concatenation of bit sequences): 

AL = aia2tb2bib2bib2bib2CL2CLiCL2CLia2aia2tbib2 , (3) 

where a\, <12 are any 7-bit values, t is the most significant bit of a\ and the 7-bit 
values bi, 62 are defined as t&2&i = Thus there are 2 7 • 2 7 = 2 14 solutions. 


Condition 2. From the definition of the key schedule 

RK 8 i+ 0 \\RK 8 i+ 1 \\RK 8 i+ 2 \\RK 8i+3 <- E 2 i {L) 0 S 2 i+ i, 

RK 8 i+A \\RK 8 i+ 5 \\RK 8 i+ 6 \\RK 8i+7 <- E 2 i+ 1 {L)®K 0S 2l+2 , 

we see that Condition 2 can be expressed as 

tt(AL) = E(AL) 0 AK, 

where it is 4- word permutation (0, 1, 2, 3) (1,0, 3, 2). Thus when AL is fixed 

(to one of the values from ([3])), the difference in the master key AK can be 
determined as 


AK = tt(AL)®Z(AL). 


(4) 
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Summary. We have shown above that Conditions 1 and 2 can be achieved 
simultaneously as there are 2 14 values for ALi (see Equation (j3j)) with corre- 
sponding values of AKi (see Equation (j4])). It means that given the difference 
in the master key AKi and the difference of the intermediate key ALi (i.e. the 
differential in the 12-round Feistel GFN ^\2 of the key schedule is AKi ALi), 
the differences in the round keys are going to be of the requested form as shown 
below: 


ARKoWARKiWAR^WARKs = Z\i||Z\ 3 ||Z\ 2 ||zA4, 

ARK 4 \\ARK 5 \\ARK 6 \\ARK 7 = Z\ 3 ||Z\i||Z\ 4 ||^2, 

ARK 28 \\ARK 29 \\ARK 30 \\ARK 31 = A 3 1 1 /Ax 1 1 A 4 1 1 A 2 , 
ARK 32 \\ARK 33 \\ARK 34 \\ARK 35 = A \ | \A 3 1 \A 2 1 \A^, 

where Z\i||Z\ 3 ||Z\ 2||^4 = ALi. As a result, we have obtained the necessary dif- 
ferences in the round keys and we can use the 4-round iterative characteristic 
from Fig. [2] 

Now we can easily specify the description of the weak- key class given by 
Equation O- The value of D coincides with the values of AL from Equation ©. 
Therefore the first linear function C\ is defined as C\(D) = 7 v(D) 0 K(D). The 
input difference in the plaintext is the same as the input difference in the first four 
round keys (which is again AL), but the order of the words is slightly different - 
instead of (Ai,A 3 , A 2 , A 4 ) it is (Ai,A 2 , A 3 , A 4 ), see Fig. [2] Hence, we introduce 
the 4- word permutation tt 2 : (0, 1, 2, 3) — » (0, 2,1,3) that corrects the order. With 
this notation, the second linear function C 2 is defined as jC 2 (D) = tt 2 (D). Finally, 
C 3 is defined similarly. CLEFIA-128 has 18 rounds, thus the last 4-round iterative 
characteristic (for the rounds 17,18) will be terminated after the second round, 
with an output difference (A 2 , A 3 , A 4 , Ai). It differs from AL only in the order 
of the four words, hence we introduce 7r 3 : (0, 1, 2, 3) (3, 1, 0, 2) and conclude 

that jC 3 (D) = tt 3 (D). 

In the weak- key class the pairs of keys are defined as ( K , K 0 7 t(D) 0 F(D)) 
and for any plaintext P, it holds 

Ek{P) 0 E K ®Tr( D )®E(D)(P ® n 2 (D)) =7 t 3 (D). (5) 

A pair of keys belongs to this class if for any of the 2 14 values D = AL de- 
fined by Equation (|3j), the 12-round Feistel GFN 4 : i 2 in the key schedule, on 
input difference AK = 7 t(AL) 0 U(AL) gives the output difference AL, i.e. 
GFN 4 ,i 2 ( k 0 tt(AL) 0 E{AL)) 0 GFN4,i 2 ( k ) = AL ‘ Therefore not all of the 
keys K have a related key and form a pair in the weak- key class, but only those 
for which the differential in the Feistel permutation holds. 

We deal with a 12-round Feistel permutation and thus the probability of the 
differential 7 t(AL) 0 E(AL) — >> AL is low. We assume it is 2 -128 (as proven 
by the designers), which is the probability of getting fixed output difference 
from a fixed input difference in a random permutation. However, even when we 
model the Feistel permutation by a random one, there still exist 2 14 key schedule 
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differentials that have a probability of 2~ 128 and that result in iterative round 
key differences. 

In CLEFIA-128, there are 2 128 possible keys K , and therefore for a specific 
value of D , the number of related keys (P, K 0 i r(D) 0 E(D)) is the same. The 
probability of the differential in the Feistel permutation is 2 -128 , thus among all 
of the pairs, only one will pass the differential. However, there are 2 14 possible 
values for D, hence the size of the weak-key class is 2 14 . 


4 Membership Test for the Weak-Key Class 

An analysis technique that succeeds when the related keys belong to the weak- 
key class is called a membership test. For the weak- key class of CLEFIA-128, 
the membership test will be a differential distinguisher that succeeds always and 
whose data, time and memory complexities are equal to 2 8 . That is to say that 
we can decide with probability 1 whether the underlying cipher is CLEFIA-128 
with weak keys or other (possibly ideal) cipher. 

Given a pair of weak keys (K,K 0 i t(D) 0 U(D)), it is easy to distinguish 
CLEFIA-128 (see Equation ([5J)) with only a single pair of related plaintexts 
(P, P 0 7r 2 (-D)) but D has to be known. If it is unknown, we will have to try all 
2 14 possible values of D (as D coincides with one of ALi). Consequently, we are 
going to end up with a brute force attack on the space of weak keys. To address 
this problem, we have to be able to detect the correct value of AL efficiently. 

Finding the correct AL{ can be performed much faster if we take into account 
the additional properties of the difference in the intermediate key. All 2 14 values 
of ALi (see Equation ©) can be defined as XOR of two elements from two 
different sets each of cardinality 2 7 as shown below 


ALi = ALi(ai, af) = 

=G 1 (ai) 0 G 2 (a 2 ), 

a i =0,...,2 7 — l,a 2 = 0,...,2 7 — 1, 

where G 1 (ai) is a 128-bit word that is the same as AL on the bits that depend 
on a\ and has 0’s for the bits that depend on a 2 while G 2 (a 2 ) is the opposite, 
i.e. coincides with AL on bits for a 2 and has 0’s for bits that depend on a^\. 

Using the representation helps to detect the correct AL by finding collisions 
on two specific sets. Assume the pair (P, K = K 0 tt(AL) 0 U(AL)) belongs to 
the weak- key class. For a randomly chosen plaintext P, let us define two pools, 
each with 2 7 chosen plaintexts: 

Pi = 7T2 (P © G 1 (a\)), a\ = 0, 1, . . . , 2 7 - 1, 

Pf = 7 r 2 (P © G 2 (4)), a* = 0, 1, . . . , 2 7 - 1. 


7 


Recall that each bit of bi, t is equal to a single bit of either a\ or a 2 . 
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Next, we obtain two pools of ciphertexts with (P, K ) as encryption keys, i.e. 
C} = E K (Pl),C? = E k (Pi ) . Finally, we compute two sets V 1 , F 2 : 

V 1 = {Vt\V> = n^iPl) ©TTg- 1 ^/)}, 

= { y2|y2 = n -\Pl)® n -\cl)}. 

The crucial observation is that the sets V 1 and V 2 will always collide, i.e. there 
exist y 1 and V 2 such that V' 1 = V 2 . This comes from the following sequence: 

y 1 © V 2 = 

= TT^(Pl) © n^(Cl) © 7T 2 \P 2 ) © ^\C 2 ) = 

= 7 Ti\Pl © P f) © 7 T^(E K (P?) © E k (P 2 )) = 

= 7r 2 1 ( 7r 2 (G 1 (a\) © G 2 (a 3 2 )))® 

© 7 Ti\E K (Pl) © E k (P} © 7T 2 (G 1 (ai) © G 2 (4)))) = 

= AL' © t ^\E K (P}) © E k (P} © 7r 2 (Z\L'))), 

where AL' = G 1 (a\) (B G 2 (a^) . Note that AL' can take all possible 2 14 values (as 
a\,a J 2 take all 2 7 values), and therefore for some particular i, j, it must coincide 
with Z\L. In such case, the difference in the plaintext is 7T2 (AL), and thus for 
the ciphertext we obtain 

Ek(PI) © E k (Pl © 7T 2 (AL)) = 7 t 3 (AL) 

Then y 1 © V 2 = AL® (AL)) = 0. 

The possibility to create the sets independently and then to find a collision 
between them is the main idea of the membership test on CLEFIA-128. It works 
according to the following steps. 

1. Choose at random a plaintext P. 

2. Create a pool of 2 7 plaintexts P/ = ^(P ® G x (a^)) and ask for the cor- 
responding ciphertext C\ obtained with encryption under the first key, i.e. 
C\ = Ek(P})- Compute the set V 1 composed of elements V/ = 7r 2 " 1 (P i 1 ) ® 

3. Create a pool of 2 7 plaintexts P 2 = ^(P ® G 2 (a l 2 )) and ask for the corre- 
sponding ciphertext C 2 obtained with encryption under the second key, i.e. 
C 2 = E^(P 2 ). Compute the set V 2 composed of elements V 2 = 7r 2 “ 1 (P i 2 ) ® 

4. Check for collisions between V 1 and V 2 . If such a collision exists, then output 
that the examined cipher is CLEFIA-128. Otherwise, it is an ideal cipher. 

The total data complexity of the membership test is 2 7 + 2 7 = 2 8 plaintexts. 
The time complexity of each of the steps 2,3 is 2 7 encryptions, while the collision 
at step 4 can be found with 2 7 operations and 2 7 memory that is used to store 
one of the sets V 1 or V 2 . Therefore, given a pair of keys from the weak- key class, 
we can distinguish CLEFIA-128 in 2 8 data, time and memory. 
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To confirm the correctness of the membership test, we implemented it for a 
small-scale variant of CLEFIA-128. Each word was shrunk to 8-bit value, thus 
the whole state became 32 bits. The Sbox from AES was taken as the round 
function F, and random 8-bit values were chosen as constants. The chunks in 
the linear function X were taken of size 5,11 (compared to the 7,57 in the 
original version). The expected size of the weak- key class in this toy version 
is 2 10 (because X = X 2 (X) has 2 10 solutions), while in practice we obtained 
960 = 2 9,9 solutions. For a random key pair chosen from this class, we were able 
to distinguish the cipher after 2 6 encryptions which confirms our findings to a 
large extent. 


5 Analysis of the Hashing Modes of CLEFIA-128 

In this section we analyze the impact of the weak-key class on hashing modes 
of CLEFIA-128. We show that compression functions built upon single-block- 
length modes instantiated with CLEFIA-128 exhibit non-random properties that 
come in a form of differential multicollisions. The analysis of hashing modes of 
a cipher is usually reduced to finding open- key distinguishers for the cipher. 
Note, open- key distinguishers come in a form of known- key (the adversary has 
the knowledge of the key, but cannot control it) and chosen- key (the adversary 
can choose the value of the key). Our analysis applies to the second case, i.e. we 
show non-randomness of the hashing modes of CLEFIA-128 when the adversary 
can control the key. 

First, let us find a pair of keys (Ah, K 2 ) that belong to the weak-key class - 
we stress that the task is to find the pair explicitly, i.e. to produce the two values 
that compose a weak- key pair. From the previous analysis we have seen that a 
pair is a weak- key pair if for one of the 2 14 values of AL defined previously: 
1) the difference AK = K\ 0 K 2 satisfies AK = tt(AL) ® X(AL), and 2) the 
12-round Feistel in the key schedule GFA/ 4,12 produces output difference Z\F, 
i.e. GF7V 4 , 12 (Ab) ® GTW 4 ,i 2 (Ah) = AL. The two conditions can be generalized 
as search for a pair that satisfies the differential tt(AL) 0 X(Z\F) — >> AL through 
the 12-round Feistel in the key schedule. 

Recall that the difference AL is an XOR of two elements (defined as G x (a 1 ) 
and G 2 (a 2 )) from sets of size 2 7 , i.e. AL = G 1 (ai) ® G 2 (a 2 ). Therefore we get 
that: 

AK = 7 r(AL) © X(AL) = t^G 1 ^) ® G 2 (a 2 )) 0 X(G\a 1 ) © G 2 (a 2 )) = 

= [7r(G 1 (ai)) 0 X(G 1 (a 1 ))] 0 [7r(G 2 (a2)) 0 F(G 2 (a 2 ))] = 

= T 1 (ai)0T 2 (a 2 ), 

where T 1 (ai) = 7r(G 1 (ai)) 0Z’(G 1 (ai)), T 2 (a 2 ) = 7r(G 2 (a2) 0 X(G 2 (a 2 )) are two 
linear functions (as 7 r, T’,G 1 ,G 2 are linear), and therefore the difference in the 
keys of a weak-key pair is an XOR of two sets as well. Using this fact, we can 
find a weak- key pair as follows: 
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1. Create a set AJC of 2 14 values T 1 (ai) ©T 2 (a 2 ),ai = 0, . . . , 2 7 — 1, a 2 = 
0, • • • , 2 7 — 1. 

2. Randomly choose a key K. 

3. Create a set V\ of 2 7 pairs 

( K u K x © tt(GFN 4 ^ 2 (Ki)) © r(GF 7 V 4 ? 12 (iF!))), 

where K\ = K © T 1 (ai), a\ = 0, . . . , 2 7 — 1. Index the set V± by the second 
elements. 

4. Create a set V 2 of 2 7 pairs 

(K 2 , K 2 © tt{GFN 4 , 12 (K 2 )) © E (GF7V 4 . 12 {K 2 ) ) ) , 

where K 2 = K © T 2 (a 2 ), a 2 = 0, . . . , 2 7 — 1. Index V 2 as well by the second 
elements. 

5. Check for collisions between V 1 and V 2 on the second (and indexed) ele- 
ments. If such a collision exists, then confirm the key pair is weak by checking 
if the xor difference of the first elements belongs to AJC. If so, then output 
that found pair (Ki,K 2 ) and exit. Otherwise, go to step 2. 

The above algorithm will output a correct weak-key pair after repeating around 
2 114 times the steps 2-5. For each randomly chosen key iF, there are 2 14 pairs 
of keys (Ki,K 2 ) with difference K\ © K 2 = K © T 1 (ai) © K © T 2 (a 2 ) = 
T 1 (ai) ©T 2 (a 2 ) = 7 r(ALi)(& £(ALi). If the output difference of 12-round Feistel 
is precisely the same ALi (an event that happens with probability 2 -128 ), i.e. if 
GFW 4?12 (iFi) © GFN AA 2 (K 2 ) = ALi , then 

^(GFAT 4 ? 12 (iFi) © GFN 4 : 12 {K 2 )) © F(GFN 4 ^ 2 (Ki) © GFN^ 12 {K 2 )) = 
7r(ALi) © F(ALi), 

and therefore 


K x © K 2 = 7r(GFA^ 4;12 (iC 1 ) © GFN 4 , 12 (K 2 )) © F(GFN^ 12 (Ki) 
©GF7V 4 , 12 (^ 2 )), 

which is equivalent to 

K x © 7r(GFA^ 4;12 (iF 1 )) © F(GFN^ 12 (Ki)) = 

K 2 © tt(GFN 4A2 {K 2 )) © F(GFN 4:12 (K 2 )). 

Therefore a collision between V\ and V 2 suggests a possible weak- key pair. The 
suggested pair is weak-key only if the input and the output differences satisfy the 
differential, thus with probability 2 -128 . As we take 2 114 random keys iF, and for 
each there are 2 14 pairs, with overwhelming probability, one will be a weak- key 
pair. To avoid false positives, we add step 1 and the additional checking at step 
5, i.e. we make sure that the difference between the keys is tt (AL i) © E(ALi) 
for some of the 2 14 good values of ALi. Hence, the algorithm will produce a 
weak- key pair in 2 14 + 2 114 x 2 x 2 7 « 2 122 time and 2 14 memory. 
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We can use the found pair to show weakness of CLEFIA-128 when used for 
cryptographic hashing. More precisely, we consider hashing based on single- 
block- lengtlH modes, where a compression function is built from a block cipher. 
If the compression function uses CLEFIA-128 then we can find a pair of weak keys 
in 2 122 time using the described algorithm. Once such pair (Ki,K 2 ) is found, 
we can produce any number of differential multicollisions [4] for any of the 12 
modes investigated by Preneel et al. [12], including the popular Davies-Meyer, 
Matyas- Meyer- O seas modes. For instance, for the Davies-Meyer mode, i.e. when 
the compression function C(H,M) is defined as C(H,M) = Em(H) 0 H , the 
differential multicollisions have the form 

C(Hi, ifi) © C[Hi 0 tt 2 (AL), ifi 0 tt(AL) 0 S(AL)) = 

= EK 1 {H i ) 0 Hi 0 £ 1 K 1 © 7 r (Z\L)©^(Z\L)(^i 0 7 T 2 {AL))) 0 Hi 0 7T 2 (AL) = 

= E Kl (Hi) 0 E Kimir (AL)®E(AL)(Hi © 7T2 (AL)) 0 7 T 2 (AL) = 

= 7 T 3 (AL) 0 7T 2 (AL), 

for i = 0, 1, . . .. Note that we do not need to call the compression functions as 
C(Hi, Ki) 0 C{Hi 0 tt 2 (AL), Ad 0 tt(AL) 0 £(AL)) = tt 3 (AL) 0 7 t 2 (AL) as 
long as (ifi, ifi 0 7r(Z\L) 0 E(AL)) form a weak-key pair. Consequently, we can 
produce an arbitrary number of differential multicollisions with the complexity 
2 122 . On the other hand, the proven lower bound (see [4]) in the case of ideal 
cipher is 2 128 . A distinguisher for the hashing based on CLEFIA-128 has already 
been presented by Aoki at ISITA’12 [2]. It works in the framework of middletext 
distinguishes [TT] (open- key version of the integral attack) , where the adversary 
starts with a set of particularly chosen states in the middle of the cipher, then 
from them (and the knowledge of the key) produces the set of plaintexts and 
the set of ciphertexts, and finally shows that these two sets have some prop- 
erty that cannot be easily reproduced if the cipher was ideal. For CLEFIA-128, 
Aoki showed how to choose 2 112 starting middle states that result in 17-round 
middletext distinguisher, and then added one more round where he used subkey 
guesses, to obtain the 18-round distinguisher. We want to point out that there 
is a substantial difference, between our result and that of Aoki. We do not fix 
the values neither of the plaintexts nor of the ciphertexts, and our analysis is 
applicable as long as the pair of chaining values has the required difference - the 
values can be arbitrary and even unknown. 

6 Conclusion 

The analysis of CLEFIA-128 presented in this paper shows existence of a weak- 
key class that consists of 2 14 pairs of keys. We have shown how to exploit the 
pairs in two different scenarios: hashing mode of CLEFIA-128 and membership 
test for the weak-key class. In the hashing mode (or open-key mode in general) 

8 The state and key sizes in CLEFIA-128 coincide, thus we can construct only single- 
block-length compression functions. 
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we have shown that a weak- key pair can be found in around 2 122 time, and such 
pair can be used to produce differential multicollisions faster than the generic 
2 128 . Furthermore, we have shown a membership test for the weak- key class 
that has 2 8 time and data complexity, compared to the generic 2 14 . The main 
ideas of the analysis have been verified with computer experiments on small-scale 
variants of CLEFIA-128. 

The analysis is invariant of three important security features that presumably 
increase the strength of a cipher. First, the non-linear part of the key schedule 
can be any random permutation (not necessarily a 12-round Feistel). Our anal- 
ysis would still work as we do not need high probability differentials for this 
permutation. Second, the state update functions (in CLEFIA-128 To, F\ are one 
round substitution-permutation networks) can be arbitrary functions or permu- 
tations, including several layers of SP - the difference never goes into them, 
hence, the probability of the characteristic in the state would stay 1. Finally, the 
number of rounds in CLEFIA-128 plays absolutely no role in our analysis - even 
if CLEFIA-128 had 1000 rounds, the complexity of the analysis would stay the 
same. 

To prevent future analysis as ours, we have to clearly understand what are 
the main drawbacks of the design. The weak- key class and the three analysis 
invariances are results of these drawbacks (not their cause) and provide clues 
on what the actual cause might be. The invariance of the state update function 
is due to the Feistel structure of the cipher - this construction can lead to 
probability 1 characteristics as it can cancel round key and state differences. 
To maintain the cancellation through arbitrary number of rounds (invariance of 
the number of rounds), the round key differences have to be iterative. The key 
schedule prevents high probability iterative (or any fixed value) differences as 
they have to be produced from a difference in the key that goes initially through a 
12-round Feistel modeled as random permutation. The Feistel, however, produces 
low probability (2 -128 ) differences (invariance of the random permutation), and 
2 14 of them become iterative round key differences due to the linear function 
used after the Feistel. That is, because of the linear function, with 2 -128 we can 
have a special type of differences in 36 rounds keys (1152 bits !). Therefore, the 
analysis of CLEFIA-128 holds due to the Feistel structure of the cipher and the 
weak linear function that is used to produce the round keys. 

To conclude, our work shows that low probability differentials (around 2~ k for 
a cipher with k-bit key and n-bit state) for the key schedule of Feistel ciphers, 
cannot be used as a sole proof of resistance against related-key differential anal- 
ysis.. A safe upper bound on the probability of such differentials, which proves 
and provides security against related- key analysis, is not 2~ k but 2~ 2k ~ n - this 
comes from the fact that there can be as many as 2? k pairs of weak keys, and 
their combined probability should be below 2~ n . 
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A Specification on CLEFIA-128 

B Analysis of CLEFIA-128 with Whitening Keys 

The whitening keys are the four words WK{,% = 0,1, 2, 3, defined as 

W Kq\\W K i\\W K 2 WW Ks = if, i.e. they are the words of the master key K. 
The first two are XOR-ed to the second and the fourth plaintext words, and the 
remaining two to the second and the fourth ciphertext words (see Fig [3]). 

To index the whitening words, we define two linear functions on 128-bit words 
(or four 32-bit words). Assume X is 128-bit word, such that X = a\b\c\d, where 
a, 6, c, d are 32-bit words. Then l(X) : {0, l} 128 — » {0, l} 128 is defined as l(X) = 
l(a\b\c\d) = 0|a|0|6. Similarly r(X) : {0, l} 128 -A {0, l} 128 is defined as r(X) = 
r(a\b\c\d) = 0|c|0|d. 

Now we can easily specify the weak-key class: 

— the key difference remains the same, 

— the plaintext difference, instead of 712 (AL), should be 7T2 (AL) ® l(AK), 

— the ciphertext difference, instead of tts(AL), should be tts(AL) ® r(AK). 

As AK = 7 t(AL) ® X(AL), it follows that the weak- key class for the original 
CLEFIA-128 is defined as 2 14 pairs of keys (if, if ® 7t(AL) ® X(AL)) such that 
for any plaintext P holds: 

Ek{P) 0 (al)®e(al)(P 0 7T2(AL) ® 1(tt(AL) 0 X(AL))) = 

7 t 3 (AL) © r{ir{AL) © X(AL)). 

Let us focus on the membership test. We define the plaintexts pools as: 

Pi =P®7T 2 (G 1 (a\))(Bl(T\a[)),a[ = 0 , 1 , . . . , 2 7 - 1 , 
pf = P © 7T2 (G 2 (4)) © Z(T 2 (4)), 4 = 0, 1, . . . , 2 7 - 1. 

This way, the difference between each two plaintext from two different pools is 
7T2(AL') © /(Z\if), i.e. it is as required by the class. 

To define the sets V 1 , V 2 that lead to a collision, first we have to understand 
how a collision can occur. In the previous membership test (on CLEFIA-128 with- 
out whitening keys), we used the trick that the difference in both the plaintext 
and the ciphertext is AL, but with permuted words (that is why we applied 
tuT 1 , 7 ^ 1 ). Here it is not the same: in the plaintext the difference is AL and two 
more words of Z\if , while in the ciphertext it is AL and the remaining two words 
of Z\if . Hence, XOR of these values does not trivially produce zero as the two 
words from l and the two from r are different. 
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Fig. 3. The encryption function of CLEFIA-128 at the left, and the key schedule at 
the right. Po , Pi , P 2 , P 3 are 32-bit plaintext words, Co , C \ , C 2 , C 3 are the ciphertext 
words, K 01 Ki, K 21 K 3 are the key words, RKpW Kj are the round and whitening 
keys, respectively, and Si are 128-bit constants. Finally, Po, Pi are the two state update 
functions, while £ is a linear function (permutation). 

Nevertheless, we can achieve collisions. Assume AL = a\b\c\d. Then the dif- 
ference Ap in the plaintext is 

Ap =7T2(a\b\c\d) ® l(7r(a\b\c\d) ® £(a\b\c\d)) = 
a\c\b\d © l(b\a\d\c) © l(£(a\b\c\d)) = 
a\c + b\b\d + a ® l(£(a\b\c\d)). 

Note, l(£(a\b\c\d) has zeros at the first and at the third words. 

Similarly, the difference Ac in the ciphertext is 

Ac =7rs(a\b\c\d) ® r(7r(a\b\c\d) © £(a\b\c\d)) = 
c\b\d\a © r(b\a\d\c) © r(£(a\b\c\d)) = 
c\b + d\d\a + c © r(£(a\b\c\d)). 

Again, in the sum r influences only the second and the fourth word. 


Low Probability Differentials and the Cryptanalysis of Full- Round CLEFIA-128 


157 


Let us introduce a function /, that acts on the four 32-bit words of a 128-bit 
state and it XORs the first word to the fourth word, and the third word to the 
second word, i.e. f(x\y\z\t) = (pc\y + z\z\t + x). Then 

f(A P ) = a\c\b\d®l(E(a\b\c\d)), 
f(A c ) = c\b\d\a © r(E(a\b\c\d)). 

The function E is linear and therefore E(a\b\c\d) = T’(a|0|0|0) + T’(0|5|0|0) + 
X(0|0|c|0) + T’(0|0|0|d). Let us denote these four values with E a , E^, E c , and Ed- 
Furthermore, with superscripts we denote the four 32-bit words of E x , e.g. E\ is 
the second (most significant) word of E a . This allows us to remove the functions 
Z, r from the terms, and as a result we obtain 

f(A P ) = a\c + S l a + Si + Si + Sl\b\d + S 2 a + Si + S 2 C + S 2 , 
f(A c ) = c\b + S 3 a + Si + Si + Sl\d\a + S 4 a + S 4 + S 4 C + S 4 . 

Next, we define a function g(x\y\z\t) that from x,z computes E*, - - - , E£ t 
E \, . . . , E\ and it adds E^, E ^ to the first word, E*, E], to the second, E%, E\ to 
the third, and E\,E\ to the fourth. Similarly, for Ac we define h(x\y\z\t) that 
from x, £ computes E \, . . . , E\ and it adds E\, E\ to the first word, E%, E\ to 
the second, E%, E % to the third, and E^ E % to the fourth. Thus we get 

d(f(Ap)) = a + S 4 a + S 4 \c + S\ + S\\b + Si + Sl\d + £ 2 + S 2 , 
Hf(A c )) = c+ Si + Sl\b + S 3 a + Sl\d + S 2 C + S 2 \a + S 4 a + S 4 . 

Obviously h(f(Ac )) = 7T4(g(/(Z\p))), where 774 ( 0 , 1,2,3) — )> (3,0, 1,2). There- 
fore the sets Vi, V 2 are defined as: 

^ = {FlP = Mg(f(P?))) © h(f(cl))}, 

v 2 = {V 2 \V 2 = Mg(f(P 2 )))®g(f(C 2 ))}, 

and a collision between this two sets suggests that AL' coincides with AL. 
Thus the membership test for CLEFIA-128 with whitening keys has the same 
complexity as before (without whitening). 
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Abstract. We propose two systematic methods to describe the differen- 
tial property of an S-box with linear inequalities based on logical condi- 
tion modelling and computational geometry respectively. In one method, 
inequalities are generated according to some conditional differential prop- 
erties of the S-box; in the other method, inequalities are extracted from 
the H-representation of the convex hull of all possible differential patterns 
of the S-box. For the second method, we develop a greedy algorithm for 
selecting a given number of inequalities from the convex hull. Using these 
inequalities combined with Mixed-integer Linear Programming (MILP) 
technique, we propose an automatic method for evaluating the security 
of bit-oriented block ciphers against the (related-key) differential attack 
with several techniques for obtaining tighter security bounds, and a new 
tool for finding (related-key) differential characteristics automatically for 
bit-oriented block ciphers. 

Keywords: Automatic cryptanalysis, Related-key differential attack, 
Mixed-integer Linear Programming, Convex hull. 


1 Introduction 

Differential cryptanalysis [7j is one of the most well-known attacks on modern 
block ciphers, based on which many crypt analytic techniques have been devel- 
oped, such as truncated differential attack [34] . impossible differential attack [9], 
and boomerang attack [51] . Providing a security evaluation with respect to the 
differential attack has become a basic requirement for a newly designed practical 
block cipher to be accepted by the cryptographic community. 

* An extended version of this paper containing more applications and the source code 
is available at http://eprint.iacr.org/2013/676 

P. Sarkar and T. Iwata (Eds.): ASIACRYPT 2014, PART I, LNCS 8873, pp. 158 fl78l 2014. 

(c) International Association for Cryptologic Research 2014 
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Contrary to the single-key model, where methodologies for constructing block 
ciphers provably resistant to differential attacks are readily available, the under- 
standing of the security of block ciphers with regard to related-key differential 
attacks is relatively limited. This limited understanding of the security concern- 
ing related-key differential attacks has been greatly improved in recent years for 
AES-like byte- or word-oriented SPN block ciphers. Along this line of research, 
two representative papers [10125] were published in Eurocrypt 2010 and Crypto 
2013. In the former paper [10], an efficient search tool for finding differential 
characteristics both in the state and in the key was presented, and the best 
differential characteristics were obtained for some byte-oriented block ciphers 
such as AES, byte- Camellia, and Khazad. In the latter paper [25] . Pierre- Alain 
Fouque et al. showed that the full-round AES- 128 can not be proven secure 
against differential attacks in the related-key model unless the exact coefficients 
of the MDS matrix and the S-Box differential properties are taken into account. 
Moreover, a variant of Dijkstra’s shortest path algorithm for finding the most 
efficient related- key attacks on SPN ciphers was developed in [25] . In [27] . Ivica 
Nikolic presented a tweak for the key schedule of AES and the new cipher called 
xAES is resistant against the related- key differential attacks found in AES. 

For bit-oriented block ciphers such as PRESENT-80 and DES, Sareh Emami 
et al. proved that no related-key differential characteristic exists with probabil- 
ity higher than 2~ 64 for the full-round PRESENT-80, and therefore argue that 
PRESENT-80 is secure against basic related- key differential attacks [22] . In [48] . 
Sun et al. obtained tighter security bounds for PRESENT-80 with respect to 
the related-key differential attacks using the Mixed-integer Linear Programming 
(MILP) technique. Alex Biryukov and Ivica Nikolic proposed two methods [TT] 
based on Matsui’s tool [42: for finding related-key differential characteristics for 
DES-like ciphers. For their methods, they stated that “... our approaches can be 
used as well to search for high probability related-key differential characteristics 
in any bit-oriented ciphers with linear key schedule 

Sareh Emami et al. [22] and Sun et aids method [48] can not be used to 
search for actual (related- key) differential characteristics, and Alex Biryukov et 
aids method El is only applicable to ciphers with linear key schedule. 

In this paper, we provide a method based on MILP which can not only eval- 
uate the security (obtain security bound) of a block cipher with respect to the 
(related- key) differential attacks, but is also able to search for actual (related- 
key) differential characteristics even if the key schedule algorithm of the block 
cipher is nonlinear. 

The problem of MILP is a class of optimization problems derived from Lin- 
ear Programming in which the aim is to optimize an objective function under 
certain constraints. Despite its intimate relationship with discrete optimization 
problems, such as the set covering problem, 0-1 knapsack problem, and travel- 
ing salesman problem, it is only in recent years that MILP has been explicitly 
applied in cryptographic research [1117118136146152157] . 

In this paper, we are mainly concerned with the application of MILP method 
in the (related- key) differential cryptanalysis. A practical approach to evaluate 
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the security of a cipher against differential attack is to determine the lower bound 
of the number of active S-boxes throughout the cipher. This strategy has been 
employed in many designs |4I8I1 511 fill 9] . MILP was applied in automatically 
determining the lower bounds of the numbers of active S-boxes for some word- 
oriented symmetric-key ciphers, and therefore used to prove their security against 
differential cryptanalysis [14144154] . Laura Winnen [53] and Sun et al. [48] ex- 
tended this method by making it applicable to ciphers involving bit-oriented 
operations. We notice that such MILP tools (14144148154] for counting the mini- 
mum number of active S-boxes are also applied or mentioned in the design and 
analysis of some authenticated encryption schemes [8120121129130131155158] . 

Our Contributions. We find that the constraints presented in [48] are too 
coarse to accurately describe the differential properties of a specific cipher, since 
there are a large number of invalid differential patterns of the cipher satisfying 
all these constraints, which yields a feasible region of the MILP problem much 
larger than the set of all valid differential characteristics. 

In this paper, we propose two methods to tighten the feasible region by cutting 
off some impossible differential patterns of a specific S-box with linear inequali- 
ties: one method is based on logical condition modeling, and the other is a more 
general approach based on convex hull computation — a fundamental algorith- 
mic problem in computational geometry. 

However, the second approach produces too many inequalities so that adding 
all of them to an MILP problem will make the solving process impractical. 
Therefore, we develop a greedy algorithm for selecting a given number of linear 
inequalities from the convex hull. 

By adding all or a part of the constraints generated by these methods, we 
provide MILP based methods for evaluating the security of a block cipher with 
respect to the (related-key) differential attack, and searching for actual (related- 
key) differential characteristics. Using these methods, we obtain the following 
results. 

1. The probability of the best related- key differential characteristic of the 24- 
round PRESENT-80 is upper bounded by 2 -64 , which is the tightest security 
bound obtained so far for PRESENT-80. 

2. The probability of the best related- key differential characteristic for the full- 
round LBlock is at most 2 -60 . 

3. We obtain a single- key differential characteristic and a single-key differential 
for the 15-round SIMON48 (a lightweight block cipher designed by the U.S. 
National Security Agency) with probability 2 -46 and 2 -41,96 respectively, 
which are the best results published so far for SIMON48. 

4. We obtain a 14-round related-key differential characteristic of LBlock with 
probability 2 -49 in no more than 4 hours on a PC. Note that the probabilities 
of the best previously published related- key characteristics covering the 13- 
and 14-round LBlock are 2 -53 and 2 -65 [56], respectively. 

5. We obtain an 8-round related- key differential characteristic of DESL with 
probability 2 -34 78 in 10 minutes on a PC. To the best of our knowledge, no 
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related-key differential characteristic covering more than 7 rounds of DESL 
has been published before. 

6. We obtain a 7-round related- key characteristic for PRESENT- 128 with prob- 
ability 2 _n and 0 active S-box in its key schedule algorithm, based on 
which an improved related-key boomerang distinguisher for the 14-round 
PRESENT- 128 and a key-recovery attack on the 17-round PRESENT- 128 
can be constructed by using exactly the same method presented in m- 

The method presented in this paper is generic, automatic, and applicable to 
other lightweight ciphers with bit-oriented operations. Due to the page limit, the 
concrete results concerning the related-key or single-key differential characteris- 
tics for LBlock, PRESENT- 128, and DES(L) are put into an extended version 
of this paper available at http://eprint.iacr.org/2013/676, 

Organization of the Paper. In Sect. [2j we introduce Mouha et aU s frame- 
work and its extension for counting the number of active S-boxes of bit-oriented 
ciphers automatically with the MILP technique. In Sect. [3j we introduce the 
concept of valid cutting-off inequalities for tightening the feasible region of an 
MILP problem, and explore how to generate and select valid cutting-off inequal- 
ities. We present the methods for automatic security evaluation with respect to 
the (related-key) differential attack, and searching for (related-key) differential 
characteristics in Sect. 4 and Sect. 5. In Sect. [6] we conclude the paper and pro- 
pose some research directions for bit-oriented ciphers and the application of the 
MILP technique in cryptography. The application of the methods presented in 
this paper to PRESENT, LBlock, and SIMON is given in Appendices. 

2 Mouha et aV s Framework and Its Extension 

2.1 Mouha et aZ.’s Framework for Word-Oriented Block Ciphers 

Assume a cipher is composed of the following three word-oriented operations, 
where uj is the word size: 

- XOR, ©il^xl^A F£; 

- Linear transformation L : F^L F^L with branch number Bl ; 

- S-box, S : F£ F£ . 

Mouha et aV s framework uses 0-1 variables, which are subjected to certain 
constraints imposed by the above operations, to denote the word level differences 
propagating through the cipher (1 for nonzero difference and 0 otherwise). 

Firstly, we should include the constraints imposed by the operations of the 
cipher. 

Constraints Imposed by XOR Operations. Suppose a ® b = c, where 
a, 6, c G F£ are the input and output differences of the XOR operation, the 
following constraints will make sure that when a, 6, and c are not all zero, then 
there are at least two of them are nonzero: 

( a T b T c ^ 2^0 
\d® > a, > 6, > c 


(i) 
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where d 0 is a dummy variable taking values from {0, 1}. If each one of a, 6, and 
c represents one bit, we should also add the inequality a + b + c < 2. 

Constraints Imposed by Linear Transformation. Let xi k and yj k , k G 

{0,1,..., m — 1}, be 0-1 variables denoting the word-level input and output 
differences of the linear transformation L respectively. Since for nonzero input 
differences, there are totally at least B l nonzero cj-bit words in the input and 
output differences, we include the following constraints: 

{ m — l 

E ( x ik + Vi J > Bl<Il 
dL>x ik ,d L >y jk , k e {0, ...,m - 1} 

where is a dummy variable taking values in {0,1} and Bl is the branch 
number of the linear transformation. 

Then, we set up the objective function to be the sum of all variables rep- 
resenting the input words of the S-boxes. 


2.2 Extension of Mouha et aV s Framework for Bit-Oriented Ciphers 

For bit-oriented ciphers, bit-level representations and additional constraints are 
needed [45; . For every input and output bit-level difference, a new 0-1 variable xi 
is introduced such that xi = 1 if and only if the difference at this bit is nonzero. 

For every S-box in the schematic diagram, including the encryption process 
and the key schedule algorithm, we introduce a new 0-1 variable Aj such that 
Aj = 1 if the input word of the Sbox is nonzero and Aj = 0 otherwise. 

At this point, it is natural to choose the objective function /, which will be 
minimized, as Aj for the goal of determining a lower bound of the number of 
active S-boxes. 

For bit-oriented ciphers, we need to include two sets of constraints. The first 
one is the set of constraints imposed by XOR operations, and the other is due 
to the S-box operation. After changing the representations to bit-level, the set 
of constraints imposed by XOR operations for bit-oriented ciphers are the same 
as that presented in ©• The S-box operation is more tricky. 

Constraints Describing the S-box Operation. Suppose (xi Q , . . . ,Xi uj _ 1 ) and 

(yj Q , . . . : yj u _ 1 ) are the input and output bit-level differences of an uj x v S-box 
marked by A t . Firstly, to ensure that A t = 1 holds if and only if a^ 0 , . . . ,x^_ 1 
are not all zero, we require that: 

(At- x ik > 0 , k e { 0 ,.. .,u - 1 } 

\ x i 0 + + • • • + Xi u _ i — At > 0 

For bijective S-boxes, nonzero input difference must result in nonzero output 
difference and vice versa: 


f ujy jo + uy h + b uy jv _i - (x io + x h + b > 0 

\ vx iu + yx i l + • • • + y Xi w _ 1 — (jjj 0 + yj 1 + • • • + yj v _ i) > 0 


(4) 
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Note that the above constraints should not be used for non-bijective S-box such 
as the S-box of DES(L) [37] . 

Finally, the Hamming weight of the (cj + z/)-bit word Xi 0 • • • Xi uj _ 1 yj 0 • • • Hj u _ 1 is 
lower bounded by the branch number Bs of the S-box for nonzero input difference 
Xi 0 • • • Xi u _ 1 , where ds is a dummy variable: 

{ uj- 1 v-l 

S X ik "b S Vjk — ^Sds /r\ 

k=0 /c=0 \°) 

ds >x ik ,d s >Vj t , k G {0, . . . - 1}, t G {0, . . . ,z/ - 1} 

where the branch number Bs of an S-box 5, is defined as Bs = min a7 46{wt((a 0 
6) 1 1 (5(a) 0 5(6)) : a, 6 G F 2 }, and wt(-) is the standard Hamming weight of an 
(c d + z/)-bit word. We point out that constraint (5) is redundant for an invertible 
S-box with branch number Bs = 2, since in this particular case, all differential 
patterns not satisfying (5) violate (4). 

0-1 Variables. The MILP model proposed above is indeed a Pure Integer Pro- 
gramming Problem since all variables appearing are 0-1 variables. However, in 
practice we only need to explicitly restrict a part of all variables to be 0-1, 
while all other variables can be allowed to be any real numbers, which leads to 
an MILP problem. Following this approach, the MILP solving process may be 
accelerated as suggested in H3 


3 Tighten the Feasible Region with Valid Cutting-off 
Inequalities 

The feasible region of an MILP problem is defined as the set of all variable 
assignments satisfying all constraints in the MILP problem. The modelling pro- 
cess presented in the previous sections indicates that every differential path 
corresponds to a solution in the feasible region of the MILP problem. How- 
ever, a feasible solution of the MILP model is not guaranteed to be a valid 
differential path, since our constraints are far from perfect to rule out all in- 
valid differential patterns. For instance, assume Xi and yi (0 < i < 3) are the 
bit-level input and output differences of the PRESENT-80 S-box. According 
to Sect. 1221 Xi , yi are subjected to the constraints of (3), (4) and (5). Obvi- 
ously, (xq • • • , £ 3 , 2 / 0 , ■ * • , 2 / 3 ) = (1, 0, 0, 1, 1, 0, 1, 1) satisfies the above constraints, 
whereas 0x9 = 1001 -A 0 xB = 1011 is not a valid difference propagation pattern 
for the PRESENT S-box, which can be seen from the differential distribution 
table of the PRESENT S-box. Hence, we are actually trying to minimize the 
number of the active S-boxes over a larger region, and the optimum value ob- 
tained in this setting must be smaller than or equal to the actual minimum 
number of active S-boxes. Although the above fact will not invalidate the lower 
bound we obtained from our MILP model, this prevents the designers or ana- 
lysts from obtaining tighter security bounds and valid (related-key) differential 
characteristics from the feasible region. 
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The situation would be even worse when modelling an invertible S-box with 
branch number Bs = 2, which is the minimal value of the branch number for an 
invertible S-box. In the case of invertible S-box with Bs = 2, the constraints of 
(3), (4) are enough, and (5) is redundant. 

Therefore, we are motivated to look for linear inequalities which can cut off 
some part of the feasible region of the MILP model while leaving the region of 
valid differential characteristics intact. For the convenience of discussion, we give 
the following definition. 

Definition 1. A valid cutting- off inequality is a linear inequality which is satis- 
fied by all possible valid differential patterns , but is violated by at least one feasible 
solution corresponding to an impossible differential pattern in the feasible region 
of the original MILP problem. 

3.1 Methods for Generating Valid Cutting-Off Inequalities 

In this section, we present two methods for generating valid cutting-off inequal- 
ities by analyzing the differential behavior of the underlying S-box. 

Modelling Conditional Differential Behaviour. In building integer pro- 
gramming models in practice, sometimes it is possible to model certain logical 
constraints as linear inequalities. For example, assume x is a continuous variable 
such that 0 < x < M, where M is a fixed integer, and we know that S is a 0-1 
variable taking value 1 when x > 0, that is x > 0 => S = 1. It is easy to ver- 

ify that the above logical condition can be achieved by imposing the constraint 
x — MS < 0. 

In fact, there is a surprisingly large number of different types of logical con- 
ditions can be imposed in a similar way. We now give a theorem which will be 
used in the following. 

Theorem 1. If we assume that all variables are 0-1 variables, then the logical 
condition that (xq, . . . , x m -i) = (So , . . . , 5 m _ 1 ) £ {0, l} m C Z m implies y = S £ 
{0, 1} C Z can be described by the following linear inequality 

rri—1 rri—1 

^2(-l) Si Xi + (~l) s+1 y - 5+ Yi 5 i > °> (6) 

i=0 i = 0 

where Si, S are fixed constants and Z is the set of all integers. 

Proof. We only prove the Theorem for the case 5 = 0. For S = 1, it can be 
proved in a similar way. We assume 

(5 0 , • • • , Sm-i) = (^o, • • • , 5 Sl _ i; 5 Sl , . . . , S m -i) = (1, 1, . . . , 1; 0, 0, . . . , 0) = A*. 

For other 0-1 patterns, it can be permuted into such a form and this will not 
affect our proof. 
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Firstly, (zA*,0) is satisfied by (6), which can be verified directly. 

Secondly, we prove that all vectors (xo, . . . , x m _i,?/) G {0, l} m + 1 such that 
(xo, . . . , x m _ 1 ) 7 ^ zA* are satisfied by (6). In such cases, we have 

m— 1 m—1 s± — 1 m 

(— 1 ) 5% Xi + (— 1 ) 5+1 y — S + Si = — Xi + x* — 2/ — 0 + si >0, 

z=0 z=0 z=0 Z— s i 

for y = 0 or y at 1. 

Finally we prove that the vector (xo, . . . , x m _i , ?/) = (zA*, 1) is not satisfied 
by the linear inequality. In such case, we have 

m—l m—1 s i — l m 

y: (-l) 15 ^* + (-l)' 5+1 y ~(5+y^(5i = -y^a;i + y^a;i-l-0 + si<0. 

z=0 z=0 z=0 i=si 

The proof is completed. 

For example, the PRESENT S-box has the following conditional differential 
[26I32I38I33I18] properties, which are referred to as undisturbed bits in [50] . 

Fact 1 . The S-box of PRESENT-80 has the following properties: 

(i) 1001^***0: If the input difference of the S-box is 0x9 = 1001, then the 
least significant bit of the output difference must be 0; 

(ii) 0001^***1 and 1000— >***1: If the input difference of the S-box is 0x1 = 
0001 or 0 x 8 = 1000 , then the least significant bit of the output difference must 
be 1; 

(Hi) ***1^0001 and ***1^0100: If the output difference of the S-box is 0x1 = 
0001 or 0x4 = 0100, then the least significant bit of the input difference must be 
1; and 

(iv) ***0—^0101: If the output difference of the S-box is 0x5 = 0101, then the 
least significant bit of the input difference must be 0. 

From Theorem 1, we have the following fact. 

Fact 2 . Let 0-1 variables (xo, xi, X 2 , X 3 ) and ( 2/0 9 2/i ? 2/2 9 2/3 ) represent the input 
and output bit-level differences of the S-box respectively, where X 3 and ys are the 
least significant bits. Then the logical conditions in Theorem 1 can be described 
by the following linear inequalities: 


—xo + x 1 +x 2 -x 3 -y 3 + 2>0 

(7) 

( x 0 + X! + X 2 - X 3 + y 3 > 0 
\ -Xo +X 1 +x 2 +x 3 + y 3 >0 

(8) 

(x 3 + y 0 + yi+y2-y3>0 

\x 3 + yo-yi+y2 + V3>0 

(9) 

-x 3 + yo - yi + 2/2 - 2/3 + 2 > 0 

(10) 
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For example, the linear inequality 0 removes all differential patterns of 
the form (x 0 , . . . , x 3 , y 0 , . . . , y 3 ) = (1, 0, 0, 1, *,*,*, 1), where (x 0 ,...,x 3 ) and 
(2/o 5 - - • ? 2/3 ) are the input and output differences of the PRESENT S-box respec- 
tively. We call this group of constraints presented in ( 7 ), (8), ( 9 ), and ( 10 ) the 
constraints of conditional differential propagation (CDP constraints for short). 
The CDP constraints obtained from Fact 1 and the differential patterns removed 
by these CDP constraints are given in Table [lj 


Table 1. Impossible differential patterns removed by the CDP constraints gener- 
ated according to the differential properties of the PRESENT S-box. Here, a vector 
(Ao, . . . , A3, 70, • • • , 73, 0 ) in the left column denotes a linear inequality Ao^o + • • • + 
A3X3 + 702/0 + • • • + 732/3 + 0 > 0 . 


Constraints obtained by log- 
ical condition modelling 

Impossible differential patterns removed 

(-1,1, 1,-1, 0,0, 0,-1, 2) 

( 1 , 0 , 0 , 1 , 0 , 0 , 0 , i), ( 1 , 0 , 0 , 1 , 0 , 0 , 1 , 1 ), ( 1 , 0 , 0 , 1 , 0 , 1 , 0 , i), ( 1 , 0 , 0 , 1 , 0 , 1 , 1 , 1 ), 

( 1 , 0 , 0 , 1 , 1 , 0 , 0 , i), ( 1 , 0 , 0 , 1 , 1 , 0 , 1 , 1 ), ( 1 , 0 , 0 , 1 , 1 , 1 , 0 , i), ( 1 , 0 , 0 , 1 , 1 , 1 , 1 , 1 ) 

(1,1, 1,-1, 0,0, 0,1,0) 

(0, 0, 0, 1, 0, 0, 0, 0), (0, 0, 0, 1, 0, 0, 1, 0), (0, 0, 0, 1, 0, 1, 0, 0), (0, 0, 0, 1, 0, 1, 1, 0), 

(0, 0, 0, 1, 1, 0, 0, 0), (0, 0, 0, 1, 1, 0, 1, 0), (0, 0, 0, 1, 1, 1, 0, 0), (0, 0, 0, 1, 1, 1, 1, 0) 

(-1,1, 1,1, 0,0, 0,1,0) 

(1, 0, 0, 0, 0, 0, 0, 0), (1, 0, 0, 0, 0, 0, 1, 0), (1, 0, 0, 0, 0, 1, 0, 0), (1, 0, 0, 0, 0, 1, 1, 0), 

(1, 0, 0, 0, 1, 0, 0, 0), (1, 0, 0, 0, 1, 0, 1, 0), (1, 0, 0, 0, 1, 1, 0, 0), (1, 0, 0, 0, 1, 1, 1, 0) 

(0,0, 0,1, 1,1,1, -1,0) 

(0, 0, 0, 0, 0, 0, 0, 1), (0, 0, 1, 0, 0, 0, 0, 1), (0, 1, 0, 0, 0, 0, 0, 1), (0, 1, 1, 0, 0, 0, 0, 1), 

(1, 0, 0, 0, 0, 0, 0, 1), (1, 0, 1, 0, 0, 0, 0, 1), (1, 1, 0, 0, 0, 0, 0, 1), (1, 1, 1, 0, 0, 0, 0, 1) 

(0,0, 0,1, 1,-1, 1,1,0) 

(0, 0, 0, 0, 0, 1, 0, 0), (0, 0, 1, 0, 0, 1, 0, 0), (0, 1, 0, 0, 0, 1, 0, 0), (0, 1, 1, 0, 0, 1, 0, 0), 

(1, 0, 0, 0, 0, 1, 0, 0), (1, 0, 1, 0, 0, 1, 0, 0), (1, 1, 0, 0, 0, 1, 0, 0), (1, 1, 1, 0, 0, 1, 0, 0) 

(0,0, 0,-1, 1,-1, 1,-1, 2) 

(0, 0, 0, 1, 0, 1, 0, 1), (0, 0, 1, 1, 0, 1, 0, 1), (0, 1, 0, 1, 0, 1, 0, 1), (0, 1, 1, 1, 0, 1, 0, 1), 

(1, 0, 0, 1, 0, 1, 0, 1), (1, 0, 1, 1, 0, 1, 0, 1), (1, 1, 0, 1, 0, 1, 0, 1), (1, 1, 1, 1, 0, 1, 0, 1) 


However, there are cases where no such conditional differential property ex- 
ists. For example, two out of the eight S-boxes of Serpent [6] exhibit no such 
property. Even when the S-box under consideration can be described with this 
logical condition modelling technique, the inequalities generated may be not 
enough to produce a satisfied result. In the following, a more general approach 
for generating valid cutting-off inequalities is proposed. 

Convex Hull of All Possible Differentials for an S-box. The convex hull 
of a set Q of discrete points in M n is the smallest convex set that contains Q. A 
convex hull in M n can be described as the common solutions of a set of finitely 
many linear (in) equalities as follows: 

^ 0 , 0^0 + ’ • * + ^ 0 , 71 — 1 ^ 71—1 + ^ 0,77 > 0 
70,0^0 + ’ ’ ’ + 70,77-1^77-1 + 70,77 — 0 




( 11 ) 
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This is called the H-Representation of a convex hull. Computing the H- 
representation of the convex hull of a set of finitely many points is a fundamental 
algorithm in computation geometry with many applications. 

If we treat a differential of an uj x v S-box as a point in M w+I/ , then we can get 
a set of finitely many discrete points which includes all possible differential pat- 
terns of this S-box . For example, one possible differential pattern of PRESENT 
S-box is 0x9 = 1001 — X 0 xE = 1110 which is identified with (1, 0, 0, 1, 1, 1, 1, 0). 
The set of all possible differential patterns for the S-boxes are essentially sets of 
finitely many discrete points in high dimensional space, hence we can compute 
their convex hulls by standard method in computational geometry. 

We now define the convex hull of a specific uj x v S-box to be the set of all 
linear (in) equalities in the H-Representation of the convex hull V<s C R w+I ' 
of all possible differential patterns of the S-box. The convex hull of a spe- 
cific S-box can be obtained by using the inequality_generator() function in the 
sage. geometry. polyhedron class of the SAGE computer algebra system [29]. The 
convex hull of the PRESENT S-box contains 327 linear inequalities. Any one of 
these inequalities can be taken as a valid cutting-off inequality. 


3.2 Selecting Valid Cutting-off Inequalities from the Convex Hull: 

A Greedy Approach 

The number of (in) equalities in the H-Representation of a convex hull computed 
from a set of discrete points in n dimensional space is very large in general. For 
instance, the convex hull C R 8 of a 4 x 4 S-box typically involves several 
hundreds of linear inequalities. Adding all of them to an MILP problem will 
make the MILP problem insolvable in practical time. Hence, it is necessary to 
select a small number, say n, of “best” inequalities from the convex hull. Here by 
“best” we mean that, among all possible selections of n inequalities, the selected 
ones maximize the number of removed impossible differentials. Obviously, this 
is a hard combinatorial optimization problem. Therefore, we design a greedy 
algorithm, listed in Algorithm [l] to approximate the optimum selection. 

This algorithm builds up a set of valid cutting-off inequalities by selecting 
at each step an inequality from the convex hull which maximizes the number 
of removed impossible differential patterns from the current feasible region. For 
instance, We select 6 valid cutting-off inequalities from the convex hull of the 
PRESENT S-box using Algorithm 1. Compared with the 6 valid cutting-off 
inequalities obtained by Theorem 1 (see Table [l]), they cut off 24 more impossible 
differential patterns, which leads to a relatively tighter feasible region. 
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Algorithm 1 . Selecting n inequalities from the convex hull T-L of an S-box 


1 

2 

3 

4 

5 

6 
7 


Input: T-L: the set of all inequalities in the H-representation of the 

convex hull of an S-box; X\ the set of all impossible differential 
patterns of an S-box; n: a positive integer. 

Output: O: a set of n inequalities selected from T~L 
r := None; A* := A; W := H\ O := 0; 
for i G {0, . . . , n — 1} do 

l* :m The inequality in Ti* which maximizes the number of removed 
impossible differential patterns from A* ; 

A* := A* — {removed impossible differential patterns by /*}; 

H* :=H*-{1*}]0:=0 U {/*}; 

end 

return O 


4 Automatic Security Evaluation 

To obtain the security bound of a block cipher with respect to related-key 
differential attack, we can build an MILP model according to Sect. 2 with the 
constraints introduced in Sect. 3.1 and Sect. 3.2 included. Then we solve the 
MILP model using any MILP optimizer, and the optimized solution, say A, is 
the minimum number of the active S-boxes from which we can deduce that the 
probability of the best differential characteristic is upper bounded by e N , where 
e is the maximum differential probability (MDP) a single S-box. 

However, it is computationally infeasible to solve an MILP model generated 
by an r-round block cipher with large r. In such case, we can turn to the so 
called simple split approach. We split the r-round block cipher into two parts 
with consecutive r\ rounds and r 2 rounds such that r 2 = r. Then we apply 

our method to these two parts. Assuming that there are at least N ri and N r2 
active S-boxes in the first and second part respectively, we can deduce that the 
probability of the best differential characteristic for this r-round cipher is upper 
bounded by e^ Nr ^ Nr ^ . If r\ and T 2 are still too large, they can be divided into 
smaller parts further. Note that our method is applicable to both the single-key 
and related- key models. 

4.1 Techniques for Getting Tighter Security Bounds 

Technique 1 . In the above analysis, we pessimistically (in the sense that we 
want to prove the security of a cipher) assume that all the active S-boxes take 
the MDP e. However, this is unlikely to happen in practice, especially in the case 
that the number of active S-boxes is minimized. Therefore, we have the following 
strategy for obtaining tighter security bound for a t-round characteristic. 

Firstly, compute the set £ of all the differential patterns of an S-box with 
probabilities greater than or equal to the S-box’s MDP e. 

Secondly, compute the H-representation Hg of the convex hull of £, and then 
use the inequalities selected from Hg by Algorithm 1 to generate a t-round 
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model according to Sect. 2 and Sect. 3. Note that the feasible region of this 
model is smaller than that of a t-round model generated in standard way, since 
the differential patterns allowed to take in this model is more restrictive. Hence, 
we hope to get a larger objective value than TV*, which is the result obtained by 
using the standard t-round model. 

Finally, solve the model using a software optimizer. If the objective value is 
greater than N t , we know that there is no differential characteristic with only N t 
active S-boxes such that all these S-boxes take differential patterns with proba- 
bility e. And hence, we can conclude that there is at least one active S-box taking 
a differential pattern with probability less than e in a t-round characteristic with 
only Nt active S-boxes. 

Technique 2. Yet another technique for obtaining tighter security bound is 
inspired by Alex Biryukov et al. and Sareh Emami et aVs (extended) split ap- 
proach [lli22j . In Sun et aV s work [48], the strategy for proving the security 
of an n-round iterative cipher against the related-key differential attacks is to 
use the simple split approach. By employing the MILP technique, compute the 
minimum number N t of differentially active S-boxes for any consecutive t-round 
(1 < t < n) related-key differential characteristic. Then the lower bound of the 
number of active S-boxes for the full cipher (n-round) can be obtained by com- 
puting 22 N tj , where 22 tj = n. Note that the computational cost is 
jei c{i, 2 ,...} jei 

too high to compute N n directly. 

We point out that this simple “split strategy” can be improved to obtain 
tighter security bound by exploiting more information of a differential charac- 
teristic. The main idea is that the characteristic covering round 1 to round m and 
the characteristic covering round m + 1 to round 2m should not be treated equal 
although they have the same number of rounds, since the starting difference of 
a characteristic of round ra + 1 to 2m is not as free as that of a characteristic of 
round 1 to round m. Therefore, we have the following strategy. 

Firstly, split an r-round into two parts: round 1 to round rq, and round rq + 1 
to round r = rq + rq. 

Secondly, construct an MILP model covering round 1 to round r. Change the 
objective function to be the sum of all S-boxes covering round ?q + 1 to round 
r. Add some additional constraints on the number of active S-boxes covering 
round 1 to round rq (One way to obtain such constraints is to solve the model 
covering round 1 to round ?q). 

Finally, solve the model using any software optimizer, and the result is the 
lower bound of the number of active S-boxes of round rq + 1 to round r (rq 
rounds in total) for any characteristic covering round 1 to round r. 

We have applied the methods presented in this section to PRESENT-80 and 
LBlock, and the results are given in Appendix A. 
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5 A Heuristic Method for Finding (Related-key) 
Differential Characteristics Automatically 

To find a (related-key) differential characteristic with relatively high probability 
covering r rounds of a cipher is the most important step in (related-key) differ- 
ential cryptanalysis. Most of the tools for searching differential characteristics 
are essentially based on Matsui’s algorithm 02]. In this section, we propose an 
MILP based heuristic method for finding (related-key) differential characteris- 
tics. Compared to other methods, our method is easier to implement, and more 
flexible. 

Thanks to the valid cutting-off inequalities which can describe the property of 
an S-box according to its differential distribution table, our method can output 
a good (related-key) differential characteristic directly by employing the MILP 
technique. The procedure of our method is outlined as follows. 

Step 1. For every S-box <S, select n inequalities from the convex hull of the 
set of all possible differential patterns of S using Algorithm 1, and generate an 
r-round MILP model in which we require that all variables involved are 0-1. 

Step 2. Extract a feasible solution of the MILP model by using the Gurobi 
[45] optimizer. 

Step 3. Check whether the feasible solution is a valid (related-key) differential 
characteristic. If it is a valid characteristic, the procedure terminates. Otherwise, 
go to step 1, increase the number of selected inequalities from the convex hulls, 
and repeat the whole process. 

We have developed a software by employing the python interface provided by 
the Gurobi optimizer, which automates the whole process of the above method. 

To demonstrate the practicability of our method, we have applied the methods 
presented in this section to SIMON and the results are given in Appendix B. 

On the Quality of the Characteristics. The characteristics found by this 
method are not guaranteed to be the best. However, if you would like to wait until 
the optimizer outputs optimum solution, the characteristic found by this method 
is guaranteed to have the minimum number of active S-boxes. Experimental 
results show that we get reasonably good solutions. 

On the Flexibility of the Searching Algorithm. By adding a small number 
of additional constraints, our method can be used to search characteristics with 
specific properties. For example, by setting some given variables marking the 
activity of some S-boxes to 1, we can search for characteristics with active S- 
boxes of predefined positions, which may be used in leaked- state forgery attacks 
[55] ; by requiring the output and input variables to be the same, we can search for 
iterative characteristics; by setting all the variables marking the activity of all the 
S-boxes in the key schedule algorithm to be 0, we can search for characteristics 
with 0 active S-boxes in its key schedule algorithm, which may be preferred in 
the related-key differential attack. 
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6 Conclusion and Directions for Future Work 

In this paper, we bring new constraints into the MILP model to describe the 
differential properties of a specific S-box, and obtain a more accurate MILP 
model for the differential behavior of a block cipher. Based on these constraints, 
we propose an automatic method for evaluating the security of bit-oriented block 
ciphers with respect to (related-key) differential attack. We also present a new 
tool for finding (related-key) characteristics automatically. 

At this point, several open problems emerge. Firstly, we observe that the 
MILP instances derived from such cryptographic problems are very hard to 
solve compared with general MILP problems with the same scale with respect 
to the numbers of variables and constraints. Hence, it is interesting to develop 
specific methods to accelerate the solving process of such problems and therefore 
increase the number of rounds of the cipher under consideration that can be 
dealt with. Secondly, the method presented in this paper is very general. Is it 
possible to develop a compiler which can convert a standard description, say 
a description using hardware description language, of a cipher into an MILP 
instance to automate the entire security evaluation cycle with respect to (related- 
key) differential attack? 

Finally, the methodology presented in this paper has some limitations which 
we would like to make clear, and trying to overcome these limitations is a topic 
deserving further investigation. Firstly, this methodology is only suitable to eval- 
uate the security of constructions with S-boxes, XOR operations and bit permu- 
tations, and can not be applied to block cipher like SPECK [5], which involve 
modulo addition and no S-boxes at all. For tools which can be applied to ARX 
constructions, we refer the reader to |1‘2i:ffll4()l41li3| . Secondly, in this paper we 
do not consider the differential effect and we assume that the expected differen- 
tial probability (EDP) n of a characteristic over all keys is (almost) the same as 
the fixed-key differential probability (DP) ttk for almost all keys (the common 
hypothesis of stochastic equivalence [35]), and that if the lower bound of the EDP 
for any characteristic of a block cipher is less than 2 -s , where s is bigger than the 
block size or key size, then the block cipher is secure against the (related-key) 
differential attack. For more in-depth discussion of the essential gap between 
EDP 7r and DP i tk, we refer the reader to m for more information. 
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A On the Security of PRESENT- 80 , and LBlock with 
Respect to the Related-Key Differential Attack 

A.l Results on PRESENT-80 

We apply the logical condition modelling method presented in Sect. 3.1 to the 
block cipher PRESENT-80 to determine its security bound with respect to the 
related- key differential attack. In each of these MILP models, we include one 
more constraint to ensure that the difference of the initial key register is nonzero, 
since the case where the difference of the initial key register is zero can be 
analyzed in the single- key model. Then we employ the Gurobi 5.5 optimizer [45] 
to solve the MILP instances. 

By default the computations are performed on a PC using 4 threads with 
Intel(R) Core(TM) Quad CPU (2.83GHz, 3.25GB RAM, Windows XP), and a 
star is appended on a timing data to mark that the corresponding com- 
putation is taken on a workstation equipped with two Intel(R) Xeon(R) E5620 
CPU(2.4GHz, 8GB RAM, 8 cores). 

We compute the number of active S-boxes for PRESENT-80 in the related- 
key model up to 14 rounds, and the results and a comparison with previous 
results without using CDP constraints are summarized in Table 2. For example, 
according to the 6th row of Table 2, the Gurobi optimizer finds that the minimum 
number of active S-boxes for 6-round PRESENT-80 is at least 5 in no more than 
16 seconds by solving the MILP model with CDP constraints 

These results clearly demonstrate that the MILP models with CDP con- 
straints lead to tighter security bounds. In particular, we have proved that there 
are at least 16 active S-boxes in the best related- key differential characteris- 
tic for any consecutive 12-rounds of PRESENT-80. Therefore, the probability 
of the best related-key differential characteristic of 24-round PRESENT-80 is 
(2 -2 ) 16 x (2 -2 ) 16 = 2 -64 , leading to the result that the 24-round PRESENT-80 
is resistant to basic related-key differential attack based on related-key differen- 
tial characteristic (rather than differential). 
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Table 2. Results obtained from MILP models for PRESENT-80 


Rounds 

With CDP Constraints 

Without CDP Constraints 

# Active S-boxes 

Time(in seconds) 

# Active S-boxes 

# Time (in seconds) 

1 

0 

1 

0 

1 

2 

0 

1 

0 

1 

3 

1 

1 

1 

1 

4 

2 

1 

2 

1 

5 

3 

5 

3 

3 

6 

5 

16 

4 

10 

7 

7 

107 

6 

26 

8 

9 

254 

8 

111 

9 

10 

522 

9 

171 

10 

13 

4158 

12 

1540 

11 

15 

18124 

13 

8136 

12 

16 

50017 

15 

18102 

13 

18 

137160* 

17 

49537* 

14 

20 

1316808* 

18 

685372* 

15 

- 

> 20days 

- 

> 20days 


A. 2 Results on LBlock 

Up to now, there is no concrete result concerning the security of full-round 
LBlock [56] against differential attack in the related-key model due to a lack of 
proper tools for bit-oriented designs. 

Since the encryption process of LBlock is nibble-oriented, the security of 
LBlock against single-key differential attack can be evaluated by those word- 
oriented techniques. However, the 29” operations in the key schedule algo- 
rithm of LBlock destroy its overall nibble-oriented structure. In this subsection, 
we apply the method proposed in this paper to LBlock, and some results con- 
cerning its security against related- key differential attacks are obtained. Note 
that the type of constraints given in (5) are removed in our MILP models for 
LBlock according to the explanations presented in previous sections. 

From Table [3] we can deduce that the probability of the best differential 
characteristic for full LBlock (totally 32 = 11 + 11 + 10 rounds) is upper bounded 
by (2 -2 ) 10 x (2 -2 ) 10 x (2 -2 ) 8 = 2 -56 , where 2 -2 is the MDP for a single S-box 
of LBlock. 

In fact, here we have an implicit trade-off between the number of constraints 
we use and the number of rounds we analyze. For example, we can use less 
constraints for every S-box and try to analyze more rounds, or we can use more 
constraints and focus on less rounds (but stronger bounds). However, it is not a 
simple task to find the best trade-off due to our limited computational power. 
We do try to analyze more rounds by using only one inequality selected from 
the convex hull for every S-box. The largest number of rounds we are able to 
analyze is 13, and we have prove that there are at least 13 active S -boxes in any 
related-key characteristic for 13-round LBlock on a PC in roughly 49 days. 
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Table 3. Results for related-key differential analysis on LBlock (The # Variables col- 
umn records the sum of the number of the 0-1 variables and continuous variables in 


the M1LP model). 


Rounds 

^Variables 

^Constraints 

# Active S-boxes 

Time (in seconds) 

1 

218+104 = 322 

660 

0 

1 

2 

292+208 = 500 

1319 

0 

1 

3 

366+312 = 678 

1978 

0 

1 

4 

440+416 = 856 

2637 

0 

1 

5 

514+520 = 1034 

3296 

1 

2 

6 

588+624 = 1212 

3955 

2 

12 

7 

662+728 = 1390 

4614 

3 

38 

8 

736+832 = 1568 

5273 

5 

128 

9 

810+936 = 1746 

5932 

6 

386 

10 

884+1040 = 1924 

6591 

8 

19932 

11 

958+1144 = 2102 

7250 

10 

43793 


Then, we try to improve the above result with the two techniques presented 
in Sect. 4.1. By using the first technique, we can show that there are at least 
IS active S-boxes in a 13 -round related-key differential characteristic of LBlock, 
and there is at least one active S-box taking a differential pattern with probability 
2~ 3 in any 13 -round related-key differential characteristic of LBlock with only 
13 active S-boxes. Therefore, the probability of a 13 -round related-key differential 
characteristic of LBlock is upper bounded by (2 -2 ) 12 x (2 -3 ) = 2 -27 . 

We now turn to the second technique presented in Sect. 4.1. By adding the 
constraint that the number of active S-boxes of any characteristic covering round 
22 to round 26 (5 rounds in total) has at least 1 active S-box (see Table [3]), 
and at most 12 active S-boxes to a 11-round (round 22 to round 32) MILP 
model (If this is not the case, it will enable us to get better bounds than the 
result presented here), we can show that there are at least 3 active S-boxes 
in a characteristic covering round 27 to round 32 . Combined with Fact 3, we 
have that the probability of the best related-key differential characteristic for 
full LBlock is upper bounded by 2“ 27 x 2“ 27 x (2“ 2 ) 3 = 2- 60 . 

B Search for Related-Key Characteristics of SIMON48 

SIMON [5] is a family of lightweight block ciphers designed by the U.S National 
Security Agency (NS A). For a detailed description of SIMON and existing at- 
tacks, we refer the reader to |2l3l23l24l28j . 

By treating the AND (F 2 XF 2 — ^ 2 ) operation as a 2 x 1 S-box, we apply our 
method to SIMON in the single- key model. For SIMON48 we obtain a 15-round 
differential characteristic with probability 2 -46 (see Table 4), which is the best 
15-round differential characteristic for 15-round SIMON48 published so far. If 
we fix the input and output differences to be the differences suggested by the 
characteristic we found, we can compute the probability of this differential by 
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searching all characteristics with probability greater than 2“ 54 in this differential, 
and the result is 2 -41,96 which is also the best result published so far. 

We would like to emphasize that in our MILP models we treat the input bits 
of the AND operation as independent input bits , and the dependencies of the 
input bits to the AND operation are not considered. Therefore, the characteristic 
obtained by our method is not guaranteed to be valid. Hence, every time after the 
Gurobi optimizer outputs a good solution (characteristic), we check its validity 
and compute its probability by the method presented in [2]. 


Table 4. Single-key differential characteristic of 15-round SIMON48 
Rounds Left Right 
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Abstract. Impossible differential cryptanalysis has shown to be a very 
powerful form of cryptanalysis against block ciphers. These attacks, even 
if extensively used, remain not fully understood because of their high 
technicality. Indeed, numerous are the applications where mistakes have 
been discovered or where the attacks lack optimality. This paper aims 
in a first step at formalizing and improving this type of attacks and in 
a second step at applying our work to block ciphers based on the Feis- 
tel construction. In this context, we derive generic complexity analysis 
formulas for mounting such attacks and develop new ideas for optimiz- 
ing impossible differential cryptanalysis. These ideas include for example 
the testing of parts of the internal state for reducing the number of in- 
volved key bits. We also develop in a more general way the concept of 
using multiple differential paths, an idea introduced before in a more 
restrained context. These advances lead to the improvement of previous 
attacks against well known ciphers such as CLEFIA- 128 and Camellia, 
while also to new attacks against 23-round LBlock and all members of 
the Simon family. 

Keywords: block ciphers, impossible differential attacks, CLEFIA, 
Camellia, LBlock, Simon. 


1 Introduction 

Impossible differential attacks were independently introduced by Knudsen m 
and Biham et al. [5]. Unlike differential attacks |6] that exploit differential paths 
of high probability, the aim of impossible differential cryptanalysis is to use 
differentials that have a probability of zero to occur in order to eliminate the 
key candidates leading to such impossible differentials. 

The first step in an impossible differential attack is to find an impossible 
differential covering the maximum number of rounds. This is a procedure that 
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has been extensively studied and there exist algorithms for finding such impos- 
sible differentials efficiently mm- Once such a maximum-length impossible 
differential has been found and placed, one extends it by some rounds to both 
directions. After this, if a candidate key partially encrypts/decrypts a given pair 
to the impossible differential, then this key certainly cannot be the right one 
and is thus rejected. This technique provides a sieving of the key space and the 
remaining candidates can be tested by exhaustive search. 

Despite the fact that impossible differential cryptanalysis has been extensively 
employed, the key sieving step of the attack does not seem yet fully understood. 
Indeed, this part of the procedure is highly technical and many parameters have 
to be taken into consideration. Questions that naturally arise concern the way 
to choose the plaintext /ciphertext pairs, the way to calculate the necessary data 
to mount the attack, the time complexity of the overall procedure as well as 
which are the parameters that optimize the attack. However, no simple and 
generalized way for answering these questions has been provided until now and 
the generality of most of the published attacks is lost within the tedious details 
of each application. The problems that arise from this approach is that mistakes 
become very common and attacks become difficult to verify. Errors in the analysis 
are often discovered and as we demonstrate in the next paragraph, many papers 
in the literature present flaws. These flaws include errors in the computation of 
the time or the data complexity, in the analysis of the memory requirements or 
of the complexity of some intermediate steps of the attacks. We can cite many 
such cases for different algorithms, as shown in Table [lj Note however, that the 
list of flaws presented in this table is not exhaustive. 


Table 1 . Summary of flaws in previous impossible differential attacks on CLEFIA-128, 
Camellia, LBlock and Simon. Symbol X means that the attack does not work, while 
/ says that the corrected attacks work. Error type (1) is when the data complexity 
is higher than the codebook, error type (2) shows a big computation flaw, error type 
(3) stands for small complexity flaws, while error type (4) is if the attack cannot be 
verified without implementation. 


Algorithm 

# rounds 

Ref. 

Type 
of error 

Repaira- 

bility 

Where 

discovered 

CLEFIA-128 without 
without whit, layers 

14 

126] 

(i) 

X 

m 

CLEFIA-128 

13 

30 

(4) 

- 

m 

Camellia without 
FL/FL- 1 layers 

12 

[34] 

(2) 

X 

this paper, similar 
problem as [33] 

Camellia- 128 

12 

|33| 

(2) 

X 

m 

Camellia-128/192/256 
without FL/FL -1 layers 

11/13/14 

23 

(3) 

/ 

m 

LBlock 

22 

26 

(3) 

/ 

m 

Simon (all versions) 

14/15/15/16/16/ 

19/19/22/22/22 

m 

(1) 

X 

Table 1 of [3] 

Simon (all versions) 

13/15/17/20/25 

12 

(2) 

X 

this paper 
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Instances of such flaws can for example be found in analyses of the cipher 
CLEFIA. CLEFIA is a lightweight 128-bit block cipher developed by SONY in 
2007 [28] and adopted as an international ISO/IEC 29192 standard in lightweight 
cryptography. This cipher has attracted the attention of many researchers and nu- 
merous attacks have been published so far on reduced round versions 
[3113213012412^8] . Most of these attacks rely on impossible differential cryptanal- 
ysis. However, as pointed out by the designers of CLEFIA m, some of these 
attacks seem to have flaws, especially in the key filtering phase. We can cite here 
a recent paper by Blondeau [7 that challenges the validity of the results in [30] , 
or a claimed attack on 14 rounds of CLEFIA-128 [36] . for which the designers 
of CLEFIA showed that the necessary data exceeds the whole codebook m- 
Another extensively analyzed cipher is the ISO/IEC 18033 standard Camellia, 
designed by Mitsubishi and NTT [4. Among the numerous attacks presented 
against this cipher, some of the more successful ones rely on impossible differ- 
ential cryptanalysis [34I33I22I25I23] . In the same way as for CLEFIA, some of 
these attacks were detected to have flaws. For instance, the attack from [33] was 
shown in [25] to be invalid. We discovered a similar error in the computation 
that invalidated the attack of [34]. Also, [34] reveals small flaws in [23]. Errors 
in impossible differential attacks were also detected for other ciphers. For ex- 
ample, in a cryptanalysis against the lightweight block cipher LBlock [26] . the 
time complexity revealed to be incorrectly computed [27] . Another problem can 
be found in [3], where the data complexity is higher than the amount of data 
available in the block cipher Simon, or in PEL where some parameters are 
not correctly computed. During our analysis, we equally discovered problems in 
some attacks that do not seem to have been pointed out before. In addition to 
all this, the more the procedure becomes complicated, the more the approach 
lacks optimality. To illustrate this lack of optimality presented in many attacks 
we can mention a cryptanalysis against 22-round LBlock m, that could easily 
be extended to 23 rounds if a more optimal approach had been used to evaluate 
the data and time complexities, as well as an analysis of Camellia [22' which we 
improve in Section [4] 

The above examples clearly show that impossible differential attacks suffer 
from the lack of a unified and optimized approach. For this reason, the first aim 
of our paper is to provide a general framework for dealing with impossible differ- 
ential attacks. In this direction, we provide new generic formulas for computing 
the data, time and memory complexities. These formulas take into account the 
different parameters that intervene into the attacks and provide a highly opti- 
mized way for mounting them. Furthermore, we present some new techniques 
that can be applied in order to reduce the data needed or to reduce the number 
of key bits that need to be guessed. In particular we present a new method that 
helps reducing the number of key bits to be guessed by testing instead some bits 
of the internal state during the sieving phase. This technique has some similari- 
ties with the methods introduced in [15117] , however important differences exist 
as both techniques are applied in a completely different context. In addition to 
this, we apply and develop the idea of multiple impossible differentials, intro- 
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duced in [32], to obtain more data for mounting our attacks. To illustrate the 
strength of our new approach we consider Feistel constructions and we apply the 
above ideas to a number of block ciphers, namely CLEFIA, Camellia, LBlock 
and Simon. 

More precisely, we present an attack as well as different time/data trade-offs 
on 13-round CLEFIA- 128 that improve the time and data complexity of the 
previous best known attack [25] and improvements in the complexity of the 
best known attacks against all versions of Camellia [22] . In addition, in order to 
demonstrate the generality of our method, we provide the results of our attacks 
against 23-round LBlock and all versions of the Simon block cipher. The attack 
on LBlock is the best attack so far in the single-key setting 0, while our attacks 
on Simon are the best known impossible differential attacks for this family of 
ciphers and the best attacks in general for the three smaller versions of Simon. 

Summary of Our Attacks. We present here a summary of our results on the 
block ciphers CLEFIA-128, Camellia, LBlock and Simon and compare them to 
the best impossible differential attacks known for the four analyzed algorithms. 
This summary is given in Table [2] where we point out with a if the mentioned 
attack is the best cryptanalysis result on the target cipher or not, i.e. by the best 
known attack we consider any attack reaching the highest number of rounds, and 
with the best complexities among them. 

The rest of the paper is organized as follows. In Section [2] we present a generic 
methodology for mounting impossible differential attacks, provide our complex- 
ity formulas and show new techniques and improvements for attacking a Feistel- 
like block cipher using impossible differential cryptanalysis. Section[3]is dedicated 
to the details of our attacks on CLEFIA and Section [4] presents our applications 
to all versions of Camellia. Due to lack of space, our applications on LBlock and 
the Simon family of ciphers are given in the full version of this paper m 

2 Complexity Analysis 

We provide in this section a complexity analysis of impossible differential attacks 
against block ciphers as well as some new ideas that help improving the time 
and data complexities. We derive in this direction new generic formulas for the 
complexity evaluation of such attacks. The role of these formulas is twofold; 
on the one hand we aim at clarifying the attack procedure by rendering it as 
general as possible and on the other hand help at optimizing the time and 
data requirements. Establishing generic formulas should help mounting as well 
as verifying such attacks by avoiding the use of complicated procedures often 
leading to mistakes. 

An impossible differential attack consists mainly of two general steps. The first 
one deals with the discovery of a maximum-length impossible differential, that 
is an input difference Ax and an output difference Ay such that the probability 


1 In [12], an independent and simultaneous result on 23-round LBlock with worse time 
complexity was proposed. 
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Table 2. Summary of the best impossible differential attacks on CLEFIA-128, Camel- 
lia, LBlock and Simon and presentation of our results. The presence of a t *’ mentions 
if the current attack is the best known attack against the target cipher. Note here 
that we provide only the best of our results with respect to the time complexity. Other 
trade-offs can be found in the following sections. ^ see Section [4] for details. 


Algorithm 

Rounds 

Time 

Data 

(CP) 

Memory 

(Blocks) 

Reference 

CLEFIA-128 

13 

2121. 2 

2 U7-8 

286.8 

[24] 

using state-test technique 

13 

2116.90 

2116.33 

283.33 

Section [3] 

using multiple impossible differentials 

13 

2122.26 

2 IHO 2 

282.60 

Section [3f 

combining with state-test technique 

13 

2116.16 

2114.58 

283.16 

ed* 

Camellia- 128 

11 

2^2 

2 122 

2 98 

m 


11 

2118.43 

2 118.4 

2 92 4 

Section [4^ 

Camellia- 192 

12 

2187.2 

2 173 

2155.41 

1223 


12 

2161.06 

2 119.7 

2150. 7 

Section [4^ 

Camellia- 256 

13 

2 2t>il 

2 173 

2203 

m 


13 

2225.06 

2119.71 

2198.71 

Section [4^ 

Camellia-256^ 

14 

2250.5 

2 120 

2 12U 

m 


14 

2220 

2 118 

2 173 

Section [4] 

LBlock 

22 

279.28 

2 b8 

272.67 

m 


22 

271.53 

2 60 

2 59 

. 11 10 . 


23 

274.06 

259.6 

2 74.6 

mnoi* 

Simon32/64 

19 

262.56 

2 iz 

2 44 

HD* 

Simon48/72 

20 

270.09 

2^ 

2 s3 

in* 

Simon48/96 

21 

294.73 

2 13 

2 70 

m* 

Simon64/96 

21 

294.50 

2 s3 

2 50 

m 

Simon64/128 

22 

2120.50 

2 e4 

2 ™ 

m 

Simon96/96 

24 

294.02 

2 s3 

2 51 

m 

Simon96/144 

25 

2190.50 

2™ 

2" 

HD 

Simon128/128 

27 

2120.0 

2 s3 

2 s1 

HD 

Simon128/192 

28 

2190.50 

2 128 

2" 

HD 

Simon128/256 

30 

2254.08 

y 25 

2™ 

HD 


that Ax propagates after a certain number of rounds, r^, to Ay is zero. The 
second step, called the key sieving phase, consists in the addition of some rounds 
to potentially both directions. These extra added rounds serve to verify which key 
candidates partially encrypt (resp. decrypt) data to the impossible differential. 
As this differential is of probability zero, keys showing such behavior are clearly 
not the right encryption key and are thus removed from the candidate keys space. 

We start by introducing the notation used in the rest of the paper. As in this 
work we are principally interested in the key sieving phase, we start our attack 
after a maximum impossible differential has been found for the target cipher. 
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— Ax, Ay: input (resp. output) dif- 
ferences of the impossible differen- 
tial. 

— ta • number of rounds of the impos- 
sible differential. 

— A in , A out : set of all possible input 
(resp. output) differences of the ci- 
pher. 

— ri n : number of rounds of the differ- 
ential path (Ax, A^). 

— r out : number of rounds of the dif- 
ferential path (Ay,A out ). 


The differential (Ax — » Ai n ) (resp. (Ay — >> A out )) occurs with probability 1 
while the differential (Ax <— Ai n ) (resp. (Ay A out )) is verified with prob- 
ability (resp. 2c L ), where c* n (resp. c out ) is the number of bit-conditions 
that have to be verified to obtain Ax from Ai n (resp. Ay from A out ). 

It is important to correctly determine the number of key bits intervening 
during an attack. We call this quantity information key bits. In an impossible 
differential attack, one starts by determining all the subkey bits that are involved 
in the attack. We denote by ki n the subset of subkey bits involved in the attack 
during the first r^ n rounds, and k out during the last r out ones. However, some 
of these subkey bits can be related between them. For example, two different 
subkey bits can actually be the same bit of the master key. Alternatively, a 
bit in the set can be some combination, or can be easily determined by some 
other bits of the set. The way that the different key bits in the target set are 
related is determined by the key schedule. The actual parameter that we need 
to determine for computing the complexity of the attacks is the information key 
bits intervening in total, that is from an information theoretical point of view, 
the log of the entropy of the involved key bits, that we denote by \ki n U k out |. 

We continue now by describing our attack scenario on (r* n -\-va + r ou t) rounds 
of a given cipher. 


2.1 Attack Scenario 

Suppose that we are dealing with a block cipher of block size n parametrized by 
a key K of size \K\. Let the impossible differential be placed between the rounds 
(v i n + 1) and (ri n + va)- As already said, the impossible differential implies that 
it is not feasible that an input difference Ax at round (r* n + 1) propagates to an 
output difference Ay at the end of round (r* n + r^). Thus, the goal is, for each 
given pair of inputs (and their corresponding outputs), to discard the keys that 
generate a difference Ax at the beginning of round (r^ n + 1) and at the same 
time, a difference Ay at the output of round (r* n + r a)- We need then enough 
pairs so that the number of non-discarded keys is significantly lower than the a 
priori total number of key candidates. 
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Suppose that the first r^ n rounds have an input truncated difference in Ai n 
and an output difference Ax, which is the input of the impossible differential. 
Suppose that there are Q n bit-conditions that need to be verified so that A{ n 
propagates to Ax and \ki n \ information key bits involved. 

In a similar way, suppose that the last r out rounds have a truncated output 
difference in A out and an input difference Ay, which is the output of the im- 
possible differential. Suppose that there are c out bit-conditions that need to be 
verified so that A out propagates to Ay in the backward direction and \k out \ 
information key bits involved. 

We show next how to determine the amount of data needed for an attack. 


2.2 Data Complexity 


The probability that for a given key, a pair of inputs already satisfying the 
differences A{ n and A out verifies all the (ci n + c out ) bit-conditions is 2~( Cin + Cout \ 
In other words, this is the probability that for a pair of inputs having a difference 
in A in and an output difference in A out , a key from the possible key set is 
discarded. Therefore, by repeating the procedure with N different input (or 
output) pairs, the probability that a trial key is kept in the candidate keys set 
is P = (1 - 2 - ( Cin + Cout )) iV . 

There is not a unique strategy for choosing the amount of input (or output) 
pairs N. This choice principally depends on the overall time complexity, which 
is influenced by AT, and the induced data complexity. Different trade-offs are 
therefore possible. A popular strategy, generally used by default is to choose AT 
such that only the right key is left after sieving. This amounts to choose P as 


p — 2 ( c m+c ou t) j v 


1 

in U/co-ut | 


In this paper we adopt a different approach that can help reducing the number 
of pairs needed for the attack and offers better trade-offs between the data and 
time complexity. More precisely, we permit smaller values of AT. By proceeding 
like this, we will be probably left with more than one key in our candidate keys 
set and we will need to proceed to an exhaustive search among the remaining 
candidates, but the total time complexity of the attack will probably be much 
lower. In practice, we will start considering values of AT such that P is slightly 
smaller than \ so to reduce the exhaustive search by at least one bit. The smallest 
value of AT, denoted by AT min , verifying 

p 2 i^-in -\~Cout ) ^ -^min g ^minX2 ^ out ) ^ 


is approximately AT min = 2 Cin+Cout . Then we have to choose AT > AT min . 

We provide then a solution for determining the cost of obtaining AT pairs such 
that their input difference belongs to A{ n and their output difference belongs 
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to A out . To the best of our knowledge, this is the first generic solution to this 
problem. We evaluated this cost as 

Cn = maxi min { \J N2 n+1 ~\ A \ \ , TV 2 n + 1_ l / h™H z W| 1 _ (1) 

{Ae{A in ,Aout} l J J 

A detailed explanation on how this formula is derived can be found in the full 
version of the paper m- The cost Cn represents also the amount of needed data. 
Obviously, as the size of the state is n, the following inequality, should hold: 

C N < 2 n . 

This inequality simply states that the total amount of data used for the at- 
tack cannot exceed the codebook. These conditions are not verified in several 
cases from [3], as well as in the corrected version of [36] which invalidates the 
corresponding attacks. 


2.3 Time and Memory Complexity 

We are going to detail now the computation of the time complexity of the attack. 
Note that the formulas that we are presenting in this section are the first generic 
formulas given for estimating the complexity of impossible differential attacks. 

By following the early abort technique [23] , the attack consists in storing the 
N pairs and testing out step by step the key candidates, by reducing at each 
time the size of the remaining possible pairs. The time complexity is then de- 
termined by three quantities. The first term is the cost Cn, that is the amount 
of needed data (see Formula (pQ)) for obtaining the N pairs, where N is such 
that P < 1/2. The second term corresponds to the number of candidate keys 
2 \kinUk out \ ? multiplied by the average cost of testing the remaining pairs. For 
all the applications that we have studied, this cost can be very closely approx- 
imated by (AT + 2\ kiTlUkout \ 2c J Cout ) C' E , where C' E is the ratio of the cost of 
partial encryption to the full encryption. Finally, the third term is the cost of 
the exhaustive search for the key candidates still in the candidate keys set after 
the sieving. By taking into account the cost of one encryption Ce, we conclude 
that the time complexity of the attack is 

T comp = ( C N + (at + C' E + 2l*l p) C E , (2) 

where Cn = max |min^ e {^. ri ,A out } | v//V2 n + 1_ l^l | , jSf2 nJrl CAin\-\A ou t \ | ? w ith 

N such that P = (1 — l/(2 Cin+Co ' at )) Ar <1/2 and where the last term corresponds 
to 2\ K \-\ kinUkout \ P2\ kinUkout I. Obviously, as we want the attack complexity to be 
smaller than the exhaustive search complexity, the above quantity should be 
smaller than 2 \ k \Ce- 

It must be noted here that this is a minimum estimation of the complexity, 
that, in practice, and thanks to the idea of Section 12.41 it approximates really 
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well the actual time complexity, as it can be seen in the applications, and in 
particular, in the tight correspondence shown between the LBlock estimation 
that we detail in m and the exact calculation from [dOj . The precise evaluation 
of C' E (that is always smaller than 1) can only be done once the attack parameters 
are known. However, C' E can be estimated quite by calculating the ratio between 
the active SBoxes during a partial encryption and the total number of SBoxes 
(thought it is not always the best approximation, it is a common practice). 

Memory complexity. By using the early abort technique ED, the only elements 
that need to be stored are the N pairs. Therefore, the memory complexity of 
the attack @ is determined by N. 

2.4 Choosing A iri , A ouU c in and c out 

We explain now, the two possible ways for choosing Z\^ n , A out , Ci n and c out . For 
this, we introduce the following example that can be visualized in Figure [l] and 
where we consider an Sbox-based cipher. In this example, we will only talk about 
A^ and Q n , however the approach for A out and c out is identical. 



(0,0, 0,0) (a, 0,0,0) 


Fig. 1 . Choosing Ai n and Ci n 


Suppose that the state is composed of two branches of four nibbles each. The 
round function is composed of a non-linear layer S, seen as a concatenation of 
four Sboxes So,Si,S2 and S3, followed by a linear layer M. There exist two 
different ways for choosing \Ai n \ and ci n : 

1. The most intuitive way is to consider \Ai n \ =4 + 4 and c* n = 4, as the size 
of a and of /3 is 4 bits, and in the first round we want 4 bits to collide. In 
this case, for a certain key, the average probability that a pair taken out of 
the 2 4+4 2 4+4_1 pairs belonging to Ai n leads to Ax is 2 -4 . 

2. In general, the difference a can take 2 4 — 1 different values. However, each 
value can be associated by the differential distribution table of the Sbox So 

2 If iV > 2^ kinUkout ^ we could store the discarded key candidates instead, this is rarely 
the case. Thus, we can consider a memory complexity of mm{N,2^ kinUkout ^}. 
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to 2 3 output differences on avera g<@, so the possibilities for the difference /3 
are limited to 2 3 . Therefore, we can consider that \Ai n \ ^4 + 3. But, in this 
case Ci n = 3, as for each input pair belonging to the 2 4+3 2 4+3_1 possible 
ones, there exist on average 2 values that make the differential transition 
a P possible (instead of 1 in the previous case). 

We can see, by using the generic formulas of Section 12.31 that both cases 
induce practically the same time complexity, as the difference in N compensates 
with the difference in Q n + c out . However, the memory complexity, given by 
TV, is slightly better in case 2. Furthermore, case 2, in which a preliminary 
pairs filtering is done, allows to reduce the average cost of using the early abort 
technique [23] . 

In several papers, for example in [33] and [23], the second case is followed. 
However, its application is partial (either for the input or the output part) and 
this with no apparent reason. Note however, that in these papers, the associated 
c 0 ut was not always correctly computed and sometimes, 8-bit conditions were 
considered when 7-bit conditions should have been accounted for. For reasons of 
simplicity, we will consider case 1 in our applications and check afterwards the 
actual memory needed. 


2.5 Using Multiple Impossible Differentials to Reduce the Data 
Complexity 

We explain in this section a method to reduce the data complexity of an attack. 
This method is inspired by the notion of multiple impossible differentials that was 
introduced by Tsunoo et al. ,32] and applied to 12-round CLEFIA-128. The idea 
in this technique is to consider at once several impossible differentials, instead of 
just one. We assume, as done in m, that the differences in A in (and in Aout ) 
lie in a closed set. There are two ways in which this can be a priori done: 

1. Take rotated versions of a certain impossible differential. We call rii n the 
number of different input pattern differences generated by the rotated ver- 
sions of the chosen impossible differential. 

2. When the middle conditions have several impossible combinations, we can 
consider the same first half of the differential path together with a rotated 
version of the second one, in a way to get a different impossible differential. 
We call n out the number of different output pattern differences generated 
by the rotated versions of the second part of the path that we will consider. 
For the sake of simplicity and without loss of generality we will only consider 
the case of rotating the second half of the path. 

It is important to point out that for our analysis to be valid, in both cases 
the number of conditions associated to the impossible differential attack should 
stay the same. Both cases can be translated into a higher amount of available 

3 This quantity depends on the Sbox. In this example, we consider that all four Sboxes 
have good cryptographic properties. 
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data by redefining two quantities, \A' in \ and \A' out \, that will take the previous 
roles of | An I and \A out \, 


\A’ in \ = |An| +log 2 (n in ) and \A' out \ = \A out \ +log 2 (n out ). 

| An I is the log of the total size of the set of possible input differences, and A' out 
is the log of the total size of the set of possible output differences. 

In this case, the data complexity Cn is computed with the corrected values 
for the input sizes and is, as can be easily seen, smaller than if only one path 
had been used. The time complexity remains the same, except for the Cn term. 
Indeed, the middle term of Formula m remains the same, as for a given pair, the 
number of key bits involved stays 2^ kinUkout ^ . Equally, as the number of involved 
possible partial keys is ni n n out 2\ kinUkout \ , the last term of Formula ([2j) is now 


2l*l 


n in • n out 2\ kinUkout \ 


(P-n in -n out -2^ uk ^) = 2^P 


and so also stays the same. 

In Section [3] we present our attacks on CLEFIA. In part of these attacks, we 
use multiple impossible differentials to reduce the data complexity. Besides, this 
technique shows particularly useful for mounting attacks on some versions of 
the Simon family for which there is not enough available data to mount a valid 
attack with the traditional method. 


2.6 Introducing the State- Test Technique 

We introduce now a new method that consists in making a test for some part 
of the internal state instead of guessing the necessary key bits for computing 
it. This somewhat reminds the techniques presented in [1511 7| in the context 
of meet-in-the-middle attacks. However, the technique that we present in this 
section, and that we call the state-test technique is different since it consists 
in checking the values of the internal state to verify if we can discard all the 
involved candidates. 

Very often during the key filtering phase of impossible differential attacks, the 
size of the internal state that needs to be known is smaller than the number of 
key bits on which it depends. As we will see, focusing on the values that a part of 
the state can take permits to eliminate some key candidates without considering 
all the values for the involved key bits. The state-test technique works by fixing 
s bits of the plaintexts, which allows us to reduce the number of information key 
bits by s. We will explain how this method works by a small example. 

Consider a 32-bit Feistel construction, where each branch can be seen as a 
concatenation of four nibbles (see Figure [2]). Suppose that the round function is 
composed of a non-linear layer S, seen as a concatenation of four Tbit invertible 
Sboxes (So, Si, S2, S3) and of a linear layer Mon F 2 4. We suppose for this exam- 
ple that the branch number of M, that is the minimal number of active Sboxes 
in any two consecutive rounds, is less than 5. Let Ax = (cq 0, 0, 0) | (0, 0, 0, 0) be 
the input difference of the impossible differential, placed at the end of the second 
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round and let Ai n = (*, *, *, 0)|(*, *, *, *) be the difference at the input of the 
block cipher. Note however that in reality, the leftmost side of A{ n only depends 
on a 4-bit non-zero difference S , i.e. Ai n = M(S , 0, 0, 0)| (*,*,*, *). 


A in 

P 0 Pi 



Fig. 2. Grey color stands for nibbles with non-zero difference. Hatched key nibbles 
correspond to the part of the subkeys that have to be guessed. The nibble x is the part 
of the state on which we apply the state-test technique. 


As can be seen in Figure [2j there are in total 4 active Sboxes and thus there 
are c* n = 16 conditions that have to be verified in order to have a transition 
from Ai n to Ax- Therefore, the first step is to collect N pairs such that P = 
(1 _ 2 -pin+c out )^N = (1 _ 2~ Cin ) N = (1 - 2 -16 ) Ar < The exact value of N 
will be chosen in a way to obtain the best trade-off for the complexities. Before 
describing the new method, we start by explaining how this attack would have 
worked in the classical way. As we can see in Figure [2j there are 3x4 bits that 
have to be guessed (Ao,Cb ^o,i and Kq^) h 1 order to verify the conditions on the 
first round and there are 2x4 bits that have to be guessed (i^o ,3 and K\$) in 
order to verify the conditions on the second round. 

Therefore, for all N pairs, one starts by testing all the 2 4 possible values for the 
first nibble of Kq. After this first guess, N x 2 -4 pairs remain in average, as there 
are 4-bit conditions that need to be verified by the guess through the first round. 
Then one continues by testing the second and the third nibble of Ko and finally 
the last nibble of Ko and the first nibble of K\. At each step, the amount of 
data remaining is divided by 2 4 . To summarize, we have \ki n U k out \ = \ki n \ = 20 
and 2 Cin+Cout = 2 Cin = 2 4 2 4 2 4 2 4 . Then Formula ([2j) can be used to evaluate the 
time complexity of the attack as 

(C;v +(n + 2 20 ^ C' E + 2 20 P2 |k| - 20 ^ C e . (3) 

We will see now how the state-test technique applies to this example and how 
it permits to decrease the time complexity. Consider the first nibble of the left 
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part of the state after the addition of the subkey K\. We denote this nibble by 
x. Note that mathematically, x can be expressed as 


x = Kifi 0 Pi, 0 ® M(S(Kq ® Po))o 
x 0 Pi,o = Pi,o ® moSo{Ko,o ® Po,o) © miSi{Ko,i ® Po,i) 

® rri2S2{Ko,2 ® Po,2) © ^363(^0 , 3 © Po, 3 ) 5 ( 4 ) 

where the m^s are coefficients in F|. 

Suppose now that for all pairs, we fix the last 8 = 4 bits of Po to the same 
constant value. One can verify that this is a reasonable assumption, as by fixing 
this part of the inputs we still have enough data to mount the attack. Then 
one starts as before, by guessing the first three nibbles of Kq. After this 12-bit 
guess, approximately N x 2“ 12 pairs remain. We know for each pair the input 
and output differences of the Sbox of the second round as the needed part of Kq 
has been guessed. Therefore, by a simple lookup at the differential distribution 
table of the involved Sbox, we obtain one value for x that verifies the second 
round conditions in average per pair (about half of the time the transition is not 
possible, whereas for the other half we find two values). Equation (j 4 ]) becomes 

x © Pi,o © rn 0 So{K 0 ^ © Po,o) © ^iPi(Po,i © Po,i) © m 2p2(Po,2 © Po,2) 

= Pl,0 © m 3 p 3 {Po ,3 © Po, 3 )} ( 5 ) 

where the left side of Equation ©, that we denote by P, is known for each pair. 

Thus, for each guess of (Po,o, Po,i, ^0,2), we construct a table of size N x 2 -12 , 
where we store these values of x ' . The last and more important step consists now 
in looking if all 2 4 possible values of x' appear in the table. Note here, that as 
N > 2 16 , the size of the table is necessarily greater than or equal to 2 4 . 

Since Po,3 is fixed, the only unknown values in Equation ([ 5 j) are K\ ? o and 
Kq,3- If all values for x' are in the table and since S3 is a permutation, for any 
choice of K\ ? o and any choice of if 0,3, there will always exist (at least) one pair 
such that Pi,o©ra 3 P 3 (Po,3©Po,3) is in the table, leading thus to the impossible 
differential. 

As a conclusion, we know that if x' takes all the possible values in the table, 
we can remove the keys composed by the guessed value (i^o,o, Po,i, ^0,2) from 
the candidate keys set, as for all the values of (iTpo, ^0,3), they would imply 
the impossible differential. If instead, x' does not take all the possible values 
for a certain value of (Po,ch Po,i, ^0,2), we can test this partial key combined 
to all the possibilities of the remaining key bits that verify Equation © for the 
missing P, as they belong to the remaining key candidates. 

The main gain of the state-test technique is that it decreases the number 
of information key bits and therefore the time complexity. For instance, 
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in this example, the variable x' can be seen as 4 information key bits @ instead 
of 2 x 4 key bits we had to guess in the classic approach (the bits of Po ,3 and of 
K li0 ). We have s = 4 less bits to guess thanks to the s = 4 bits of the plaintext 
that we have fixed. Thus the time complexity in this case becomes 

(c N + (n + 2 20 “ 4 ^) C' E + 2 2 °- 4 j P2I*I-( 20 - 4 )) C E . (6) 

One can see now by comparing Equations © and © that the time complexity 
is lower with the state-test technique, than with the trivial method. Indeed, the 
first and the third term of the Equations (|6j) and (|3]) remain the same, while the 
second term is lower in Equation ([6]). Finally, note that the probability P for 
a key to be still in the candidate keys set remains the same as before. Indeed, 
during the attack we detect all and the same candidate keys for which none of 
the N pairs implies the impossible differential, which are the same candidate 
keys that we would have detected in a classic attack. 

We would like to note here that we have implemented the state-test technique 
on a toy cipher, having a structure similar to the one that we introduced in this 
section, and we have verified its correctness. 

Application of the state-test technique in parallel for decreasing the probability P . 
An issue that could appear with this technique is that as we have to fix a part 
of the plaintexts, s bits, the amount of data available for computing the N pairs 
is reduced. The probability P associated to an attack is the probability for a 
key to remain in the candidate keys set. When the amount of available data is 
small, the number of pairs N that we can construct is equally small and thus 
the probability P is high. In such a situation, the dominant term of the time 
complexity (Formula ([2])), is in general the third one, i.e. 2 \ K \p. 

More precisely, we need the sum of log 2 (CV) and s, the number of plaintext 
bits that we fix, to be less than or equal to the block size. This limits the size 
of N that we can consider, leading to higher probabilities P, and could lead, 
sometimes, to higher time complexities. To avoid this, one can repeat the attack 
in parallel for several different values, say T, of the fixed part of the plaintext. In 
this case, the data and memory needed are multiplied by Y. On the other hand, 
repeating the attack in parallel permits to detect more efficiently if a guessed 
key could be the right one. Indeed, for a guessed key, only if none of the tables 
constructed as described above contains all the values for P, one can test if this 
guessed key is the correct one. 

To summarize, by repeating the state-test technique in parallel, we multiply 
the available data by Y, as well as the available pairs, and since the attack is done 
Y times in parallel, the probability P becomes P Y . The probability decreases 

4 Note that we could, equivalently, consider all possible values of x in the last step, 
and consider the associated remaining pairs table, that would have a size of N2~ 16 
(empty if the key is a good candidate, not empty otherwise), obtaining the same key 
candidates of 16 bits, 12 from (Po,o, Po,i, Po, 2 ) and 4 information key bits from x , 
with the same complexity as in the previously described method. 
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much faster than the data or the other terms of the time complexity increase. 
Therefore, the Formula ([2j) becomes in this case: 

(cjv xY+ (iV xY + C' E + 2 |K| P Y ) C E - (7) 

In Section [3] we are going to see an application of this technique to 13-round 
CLEFIA-128, and at the end of Section[4]we show an application on Camellia-256. 

3 Application to CLEFIA 

CLEFIA is a lightweight 128-bit block cipher designed by Shirai et al. in 2007 [28] 
and based on a 4-branch generalized Feistel network. It supports keys of size 
128, 192 or 256 bits and the total number of iterations, say R , depends on the 
key size. More precisely, R = 18 for the 128-bit version, while R = 22 and 
R = 26 for the two following variants. A key-scheduling algorithm is used to 
generate 2 R round keys RKo , . . . , RK^r - i and 4 whitening keys WKo , . . . , WK^. 
The whitening keys are XORed to the right branches of the first and the last 
round. CLEFIA’s round function design can be visualized in Figure [3] For a 
more complete description of the specifications one can refer to [28] . 

We describe now several attacks against 13- round CLEFIA-128. 


3.1 Impossible Differential Cryptanalysis of 13-round CLEFIA-128 

The authors of m noticed that a difference on the internal state of CLEFIA 
of the form P l = O 32 IO 32 IO 32 1 A cannot lead to a difference P 2+9 = O 32 IO 32 1 -SIO 32 
after 9 rounds, where A and B are 4-byte vectors for which only one byte in a 
different position is active (e.g. A = ( 0 , Os, Os, Os) and B = (Os, /?, Os, Os)). We 
use this same 9-round impossible differential and place it between rounds 3 and 
11. Therefore, for our attack, r^ n = r out = 2 and r a = 9, as in m i- 


Ay 



Fig. 3. The attack on CLEFIA-128. Grey color stands for bytes with a non-zero differ- 
ence, while hatched bytes are the subkey bytes that have to be guessed. 
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The differential placed on the top and at the bottom of the impossible dif- 
ferential are depicted in Figure [3j We describe now the parameters for our 
cryptanalysis of 13-round CLEFIA-128. As can be seen in Figure [3] there are 
Ci n + c ou t = 40 + 40 bit-conditions that need to be verified so that the differ- 
ence in the plaintexts A in = O 32 1 (*s, Os, 0 8 , 0 8 )|M 0 (* 8 , 0 8 , 0 8 , 0 8 ) |*32 propagates 
to Ax = O 32 IO 32 IO 32 1 Os, Os, Os ) and the difference in the ciphertexts A out = 
O 32 KO 8 , *8, 0s, 0 s)|Mi( 08, *8, 0s, 0s ) | *32 propagates to Ay — O 32 IO 32 KO 8 , /?, 0s, 0s) 
|0 32 . In this way, \A in \ = \ A out \ = 48. 

Following the complexity analysis of Section [2j we need to construct at least 
TVjnin = 2 80 pairs. The cost to construct these pairs is 

C Nmin = max { 7280212^, 2 80 2 129 - 48 - 48 } = 2 113 . 

Using the state-test technique. We use now the state-test technique, described 
in Section [2~T6l to test the 8 bits of the internal state denoted by x in Figure [3j 
instead of guessing the whole subkey RKq and the XOR of the leftmost byte of 
RK 2 and WKq. For doing this, we need to fix part of the 32 leftmost bits of the 
plaintexts. As the number of needed data is CV min = 2 113 , we can fix at most 
128 — 113 = 15 bits. However, as each Sbox is applied to 8 bits, we will only fix 
one byte of this part of the plaintexts. We will guess then 24 bits of the subkey 
RKq which are situated on the other bytes. 

During a classical attack procedure, we would need to guess 32 bits of RK 1 , 32 
bits of RKq and 8 bits of RK 2 (&WKq, thus ki n = 72. We would also need to guess 
8 bits of RK 2 3 ($WK 2 , 32 bits of Rif 24 and 32 bits of Rif 25 , therefore k out = 72. 
However, the subkeys RK\ and Rif 24 share 22 bits in common. As a consequence, 
the number of information key bits would be | ki n U k out \ = 72 + 72 — 22 = 122. 
As we will fix 8 bits of the plaintexts, according to Section 12.61 it is the same 
to say that there will be \ki n U k out \ — 8 = 122 — 8 = 114 bits to test. The time 
complexity of our attack, computed using Formula ([2j) is then 

(c„ + (jv + 2'“^) i + 2 “p)c e , 

where the fraction 18/104 is the ratio of the cost of partial encryption to the 
full encryption. Since our attack needs at least 2 113 plaintexts and since we fixed 
8 bits out of them, we have 128 — 113 — 8 = 7 bits of freedom for building 
structures. 

Among all possible trade-offs with respect to the amount of data, the best 
time complexity is 2 116 - 90 Ce with 2 83,33 pairs built from 2 116,33 plaintexts. 

Using multiple impossible differentials. The authors of m noticed that there 
exist several different 9-round impossible differentials, see 31] Table 1]. In [32] . 
multiple impossible differentials were used to attack 12 rounds of CLEFIA-128. 
Here, we will apply our formalized approach of this idea presented in Section [2+1 
to reduce the data complexity of the attack on 13 rounds of CLEFIA-128. 

We use the ni n = 2x4 different inputs to the impossible differentials, that is 
P l — O 32 1 A | O 32 1 032 and P l = O 32 1 032 1 032 1 A, where A can take a difference on only 
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one of the four possible bytes. For each one of them, there are n out = 3 different 
output impossible differences P 2 + 9 = O 32 IO 32 1^|032 after 9 rounds, where B has 
only one byte active in a different position than the active byte in A. We have 
now \Kn\ = \ A in\ + log 2 (8) =48 + 3 and \A' out \ = \A out \ + log 2 (3) = 48 + 1.58. 
Since the bit-conditions remain unchanged, Q n -\-c out = 80, the minimal number 
of pairs needed for the attack to work is 7V m in = 2 80 . For this number of pairs, 
we need CV min = 2 113-4,58 = 2 108,42 plaintexts. The number of information key 
bits is I k in U k out \ = 122. We have then [C N + (N + 2 122 ^) ^ + P2 128 ) C E - 
Among all the possible trade-offs with respect to the amount of data, the best 
time complexity we obtained is 2 122 ,26 Ce with 2 82,6 pairs built from 2 1 11,02 plain- 
texts. Recall here that the aim of this approach was to reduce data complexity. 
Thus, in this attack the gain on the data complexity is the important part@. 

In the full version of this paper HD we show how to combine the state-test- 
technique together with multiple differentials in order to reduce at the same time 
the time and the data complexity for the attacks on CLEFIA-128. 

4 Applications to Camellia 

Camellia is a 128-bit block cipher designed by Aoki et. al. in 2000 [4j. It is 
a Feistel-like construction where two key-dependent layers FL and FL~ X are 
applied every 6 rounds to each branch. Whitening keys are equally applied to the 
first and the last round of the cipher. There exist three different versions of the 
cipher, that we note Camellia-128, Camellia-192 and Camellia-256, depending 
on the key size used. The number of iterations is 18 for the 128-bit version and 
24 for the other two versions. A detailed description of Camellia’s structure can 
be found in the full version of the paper. For further details, one can refer to [4]. 

Previous Cryptanalysis. Camellia is since 2005 an international ISO/IEC 
standard and has therefore attracted a lot of attention from the cryptographic 
community. Since Camellia has a particular design, involving the so-called 
FL/FL -1 layers, its cryptanalysis can be classified in several categories. Some 
attacks consider the FL/FL _1 functions, while others do not take them into con- 
sideration. Equally, some attacks take into account the whitening keys, whereas 
others don’t and finally all attacks do not start from the same round. The best 
attacks on Camellia in terms of the number of rounds and the complexities are 
those presented in [22] Section 4.2]. 

Here we start by presenting improvements of the best attacks that include 
the FL/ FL -1 layers and the whitening keys. Next we build an attack using the 
state-test technique on 14-round Camellia-256 starting from the first round but 
without the FL/FL _1 layers and the whitening keys. 


Improvements. We improve here the complexities of the previous attacks that 
take into account the FL/FL -1 layers and the whitening keys on all three 


5 In [24] . the authors used a loose approximation for C E , as C' E — 1/104. 
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versions of Camellia. By using the complexity analysis introduced in Section [2] 
we can optimize the complexities of the corresponding attacks from [22]. Note 
that we use for this the same parameters as in [22] . The parameters of our attacks 
on 11-round Camellia-128, 12-round Camellia-192 and 13-round Camellia-256 
are depicted in Table [3] As can be seen in Table [2] the time complexity of 
our improved attack on Camellia-128 is 2 118A3 Ce, with data complexity 2 118-4 
and memory complexity 2 92,4 . For Camellia-192, the time, data and memory 
complexities are 2 161 ' 06 Ce, 2 119,7 and 2 150,7 respectively, while for Camellia-256 
the corresponding complexities are 2 225 06 (7 e;, 2 119,71 and 2 198,71 . 


Table 3. Attack parameters against all versions of Camellia 


Algorithm 

| Ai n | 

| A out | 

r in 

r ou t ta 

Cin 

Cout 

| kin U k 0 ut | 

Camellia- 128 

23 

80 

1 

2 

8 

32 

57 

96 

Camellia- 192 

80 

80 

2 

2 

8 

73 

73 

160 

Camellia- 256 

80 

128 

2 

3 

8 

-a 

CO 

121 

224 


Using the State- Test Technique on Camellia-256. We provide here an im- 
possible differential attack on Camellia-256 without FL/FL -1 layers and whiten- 
ing keys by using the state-test technique. Note here, that unlike all previous 
attacks not starting from the first round in order to take advantage of the key 
schedule asymmetry, our attack starts from the first round of the cipher. It cov- 
ers 14 rounds of Camellia-256 which is, to the best of our knowledge, the highest 
number of rounds attacked for this version. In [22] another attack on 14-round 
Camellia-256 with FL/FL -1 and whitening keys is presented, however, as said 
before, it does not start from the first round and it uses a specific property of 
the key schedule at the rounds where it is applied. 

In this attack, we consider the same 8-round impossible differential as in [25] 
and we add 4+2 rounds such that r^ n = 4, r out = 2 and va = 8. We have \ Ai n \ = 
128, \A out \ = 56, Ci n = 120 and c out = 48. Then we need at least 7V min = 2 168 
plaintexts pairs. The amount of data needed to construct these pairs is CV min = 

max | v / 2 168 2 129-128 , 2 168 2 129_184 | = 2 113 . There remain then 128 — 113 = 15 
bits of freedom. Thus, we can fix s = 8 bits on the ciphertexts to apply the 
state-test technique on the 8 bits of the internal state at the penultimate round. 
The number of information key bits is | ki n U k out | = 227 — 8 = 219 since there 
are 45 bits shared between the subkeys with respect to the key schedule. The 
best attack is obtained with N = 2 118 pairs. In this case, the time complexity is 
2 220 Ce, the data complexity is 2 118 plaintexts and the memory is 2 118 . 

5 Conclusion 

To start with, we have proposed in this paper a generic vision of impossible 
differential attacks with the aim of simplifying and helping the construction 
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and verification of this type of cryptanalysis. Until now, these attacks were very 
tedious to mount and even more to verify, and so, very often flaws appeared in 
the computations. We believe that our objective has been successfully reached, 
as it can be seen by the high amount of new improved attacks that we have been 
able to propose, as well as by all the different possible trade-offs for each one of 
them, something that would be near to unthinkable prior to our work. 

Next, the generic and clear vision of impossible differential attacks has allowed 
us to discover and propose new ideas for improving these attacks. In particular, 
we have proposed the state-test technique, that allows to reduce the number 
of key bits involved in the attack, and so to reduce the time complexity. We 
have also formalized and adapted to our generic scenario the notion introduced 
in [32] of multiple impossible differentials. This option allows reducing the data 
complexity. Finally, we have proposed several applications for different variants 
of the Feistel ciphers CLEFIA, Camellia, LBlock and Simon, providing in most 
of the cases, the best known attack on reduced-round versions of these ciphers. 

We hope that these results will simplify and improve future impossible attacks 
on Feistel ciphers, as well as their possible combination with other attacks. For 
instance, in [35] a combination of impossible differential with linear attacks is 
proposed. We haven’t verified these results, but this direction could be promising. 
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Abstract. We show that the so-called super S-box representation of 
AES - that provides a simplified view of two consecutive AES rounds 
- can be further simplified. In the untwisted representation of AES pre- 
sented here, two consecutive AES rounds are viewed as the composition 
of a non-linear transformation S and an affine transformation R that re- 
spectively operate on the four 32-bit columns and on the four 32-bit rows 
of their 128-bit input. To illustrate that this representation can be helpful 
for analysing the resistance of AES-like ciphers or AES-based hash func- 
tions against some structural attacks, we present some improvements of 
the known-key distinguisher for the 7-round variant of AES presented 
by Knudsen and Rijmen at ASIACRYPT 2007. We first introduce a 
known-key distinguisher for the 8-round variant of AES which constructs 
a 2 64 -tuple of (input, output) pairs satisfying a simple integral property. 
While this new 8-round known-key distinguisher is outperformed for 8 
AES rounds by known-key differential distinguishers of time complex- 
ity 2 48 and 2 44 presented by Gilbert and Peyrin at FSE 2010 and Jean, 
Naya-Plasencia, and Peyrin at SAC 2013, we show that one can take 
advantage of its specific features to mount a known-key distinguisher for 
the 10-round AES with independent subkeys and the full AES- 128. The 
obtained 10-round distinguisher has the same time complexity 2 64 as 
the 8-round distinguisher it is derived from, but the highlighted input- 
output correlation property is more intricate and therefore its impact on 
the security of the 10-round AES when used as a known key primitive, 
e.g. in a hash function construction, is questionable. The new known-key 
distinguishers do not affect at all the security of AES when used as a 
keyed primitive, for instance for encryption or message authentication 
purposes. 


1 Introduction 

In this paper we present an alternative representation of AES. More precisely 
we show that AES can be viewed as the composition of other elementary trans- 
formations than those originally used for the specification of its round function. 
While one might wonder whether selecting any of the equivalent descriptions of 
a cipher is more than an arbitrary convention, numerous examples illustrate that 
the choice of an appropriate description may be very useful for highlighting some 
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of its structural features and serve as a starting point for its cryptanalysis or for 
optimised implement ions. To take a simple example, it is well known that while 
the so-called ladder representation of the Feistel scheme is strictly equivalent to 
its more traditional twisted representation for any even number of rounds, it is 
helpful for understanding some attacks against DES and DES-like ciphers, for 
instance the Davies-Murphy attack [8j. 

In the case of AES, several alternative representations have been proposed 
[9120] to highlight some aspects of its algebraic structure. These representations 
respectively allow to relate the ciphertext to the plaintext using continued frac- 
tions, resp. algebraic equations over GF( 2 8 ). In [2] it was shown that numerous 
dual ciphers of AES - i.e. equivalent descriptions of AES up to fixed, easy to 
compute and to invert bijective mappings on the plaintexts, the ciphertexts, and 
the keys - can be obtained by applying appropriately chosen modifications to 
the irreducible polynomial used to represent GF{ 2 8 ), the affine transformation 
in the S-box, the coefficients of MixColumns, etc. This observation was further 
extended in [3]. While these dual ciphers can be considered as equivalent rep- 
resentations of AES, these representations essentially preserve the structure of 
the round function of the AES up to small variations on the exact parameter of 
each elementary transformation. They are therefore closer to the original AES 
than the equivalent representations we consider in this paper. 

The starting point for the AES representation introduced here is the so-called 
super S-box (or super-box) representation of two AES rounds which allows to 
describe two consecutive AES rounds as the composition of one single non-linear 
operation, namely a range of four parallel 32-bit to 32-bit key-dependent S-boxes 
and several affine transformations. This representation was introduced in 0 by 
the designers of AES as a useful notion for the analysis of AES differentials over 
two rounds. It was subsequently reused in jnji2] and [18] in order to extend 
so-called rebound attacks on AES-like permutations by at least one round: this 
improved rebound technique, sometimes referred to as super S-box cryptanalysis, 
was shown to be applicable in two related contexts, the cryptanalysis of AES- 
like hash functions and the investigation of so-called known-key distinguishers 
for AES-like block ciphers. Many recent improved distinguishers for reduced- 
round versions of AES-like hash functions such as the SHA-3 candidates Grqstl 
and ECHO are using super S-boxes, e.g. [19116115] . 

We introduce a novel representation of two consecutive AES rounds that re- 
sults from an extra simplification of the super S-box representation. The simpli- 
fication relates to the description of the affine transformations that surround the 
32-bit super S-boxes. We show that all these transformations can be replaced by 
one simple 32-bit oriented affine transformation that operates on the rows of the 
4x4 matrix of bytes representing the current state. We propose to name the 
resulting view of two or more generally r AES rounds the untwisted representa- 
tion since it avoids viewing the affine transformations that surround the super 
S-boxes as column-oriented operations “twisted” by the action the ShiftRows 
transformation. The untwisted representation thus provides an equivalent de- 
scription of two consecutive AES rounds as the composition of: 
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— a non-linear transformation denoted by S (a shorthand for “super S-boxes”) 
that consists of the parallel application of four non-linear bijective mappings 
which operate on the four 32-bit columns of the AES state. These four map- 
pings are essentially super S-boxes up to permutations of the four input bytes 
and the four output bytes of each column; 

— an affine transformation denoted by R (a shorthand for “MixRows”) that 
consists of the parallel application of four affine mappings which operate on 
the four 32-bit rows of the AES state. 


Sq Si S2 S3 


Rsl 

Ri 

Rl 

Ri. 


S R 


Fig. 1. Equivalent representation of two AES rounds as the composition Ro S of four 
parallel non-linear bijections of the columns and four parallel affine bijections of the 
rows of the input state 


As shown in Figure 1, two consecutive AES rounds can thus be viewed as one 
“super-round” that is the composition Ro S of S and R. As will be shown more 
in detail in the sequel, the small price to pay for this simplified view is that in 
the resulting equivalent representation of 2 r AES rounds as the composition of 
r super-rounds, the first (resp. last) super-round is preceded (resp. followed) by 
a simple affine permutation. 

While an alternative representation of a cipher can obviously be regarded 
in itself neither as a design nor as a cryptanalysis result, we believe that the 
simplicity of the new representation can play a significant heuristic role in the 
investigation of structural attacks on reduced-round versions of AES. Indeed, 
the new representation pushes the advantage of the super S-box representation 
of highlighting the 32-bit structure underlying the AES transformation one step 
further. 

To illustrate this alternative representation, we present extensions of the 
known cryptanalytic results on reduced-round versions of AES in the so-called 
known-key model. The known-key model was first introduced by Knudsen and 
Rijmen in El- Attacks in this model are most often named known-key distin- 
guishers and we will use this terminology in the sequel0 An integral known- 
key distinguisher for the 7-round AES was introduced by Knudsen and Rijmen 
in H3 We first present an improvement of this distinguisher whose idea was 
inspired by the use of the untwisted representation of AES. This provides a 

1 This terminology may seem a bit confusing since known-key distinguishers have 
little to do with the notion of distinguisher one considers in more traditional se- 
curity models, namely a testing algorithm with an oracle access capability. But on 
the other hand the wording known-key distinguisher conveys probably less risks of 
misinterpretation than the wording known- key attack. 
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known- key distinguisher against the 8-round AES. While this distinguisher is 
outperformed by the differential known-key distinguishers for the 8-round AES 
of [12] and [14], whose respective complexities are 2 48 and 2 44 , we show that one 
can take advantage of its specific features, that reflect integral properties of the 
8-round AES, to extend it by one outer round at both sides. We thus obtain 
the first known- key distinguisher for the full 10-round AES. This known- key 
distinguisher has the same time complexity 2 64 (now measured as an equivalent 
number of 10-round AES encryptions) as the one of the 8-round distinguisher it 
is derived from, but the highlighted input-output correlation property is more 
intricate. We nevertheless provide some evidence that unlike some generic known 
distinguishers that are known to exist for block ciphers if the key size is suffi- 
ciently small, the obtained distinguisher can reasonably be considered mean- 
ingful. While in this paper we will only investigate the security of AES in the 
known-key model, it is worth mentioning a recent result on the security of AES in 
a related but even stronger security model, namely the chosen-key distinguisher 
on the 9-round AES-128 of TO] . 

The rest of this paper is organized as follows. In Section 2, we introduce the 
novel representation of two consecutive AES rounds and of 2 r AES rounds. In 
Section 3, we propose a definition of the known- key model, i.e. we define the 
adversaries considered in this model and we remind known impossibility results 
on the resistance of block ciphers to all known- key distinguishers. In Section 4, 
we show how to use the untwisted representation of AES to mount known-key 
distinguishers for the 8-round AES and its extension to the full 10-round AES 
and why the latter distinguisher can be considered meaningful. 

2 A New Representation of AES 

Notational Conventions and Usual Representation of AES. Throughout 
this paper we most often denote the composition of two mappings F and G 
multiplicatively by F • G instead of using the more classical notation G o F. The 
advantage of this notation in the context considered here is that when read from 
left to right it describes the successive transformations that are applied to the 
input value. 

Let us briefly recall the AES features that will be useful for the sequel and the 
associated notation. Each AES block is represented by a four times four matrix of 
bytes. While there are three standard versions of AES, of respective key lengths 
k= 128, 192, and 256 bits and respective number of rounds 10, 12, and 14 rounds, 
for the purpose of this paper we restrict ourselves for the sake of simplicity to the 
full 10-round AES-128 and reduced-round versions of this ciphers For r < 10, 
the r-round version of the AES- 128 encryption function is denoted by AES r and 
is parametrized by (r + 1) 128-bit subkeys denoted by Kq to K r . These subkeys 

2 However, since the AES properties we are investigating do not relate to the key 

schedule but to the data encryption part of the block cipher that is the same for all 

AES versions, all the presented results are also applicable to reduced-round versions 

of AES-192 and AES-256. 
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are derived from a k - bit key K by the key schedule; since the exact features of the 
AES- 128 key schedule are not relevant for the analysis presented here, we do not 
detail them and refer to the full specification of AES for their description. Each 
round of the encryption function AES r is the composition SB • SR • MC • AK of 
four transformations named SubBytes or SB, ShiftRows or SR, MixColumns or 
MC, and AddRoundKey or AK. SubBytes applies a fixed 8-bit to 8-bit bijective 
S-box to each input byte, ShiftRows circularly shifts each of the four 4-byte rows 
of the input state by 0, 1, 2, and 3 bytes to the left, MixColumns applies to each 
of the four-byte columns of the input state, viewed as a 4-coordinate vector with 
GF( 2 8 ) coefficients, a left multiplication by a fixed 4x4 matrix M with GF( 2 8 ) 
coefficients, and at round i E [1; r], AddRoundKey or AK consists of a bytewise 
exclusive or of the input block with subkey Kp ^ | The first round of AES r is 
preceded by a key addition with the subkey Ko and the MixColumns operation 
is omitted in the last round. In the sequel we will sometimes also have to refer 
to the variant of AES r where the MixColumns transformation is kept in the last 
round: we will denote this variant by AES r + . At the end of Section 4, we will also 
have to refer to the r-round variant of AES parametrized by r + 1 independent 
subkeys. Depending whether the MixColumns transformation is omitted or kept 
in the last round, we will denote this variant by AES* or AES* + . 


Super S-box Representation of 2 Consecutive AES Rounds. The super 
S-box representation allows to view two consecutive AES rounds as the parallel 
invocation of four 32-bit to 32 bit mappings named super S-boxes - which are 
applied to the four columns of the AES state - surrounded by affine applications. 
More in detail, since the transformations SB and SR commute and the com- 
position of transformations is associative, the composition of two consecutive 
rounds: 

SB -SR- MC -AK -SB -SR- MC • AK 

can be rewritten as: 

SR • (SB • MC • AK • SB) • SR • MC • AK. 

We can notice that the middle term in brackets, i.e. Super SB = (SB • MC • 
AK • SB), where Super SB stands for “Super S-boxes”, is the composition of 
transformations that all preserve the column-wise structure of the AES state. 
Thus Super SB splits up into 4 parallel key-dependent bijective transformations 
of one column of the input state. It is surrounded by the left, resp right affine 
transformations SR, resp SR-MC-AK. Each super S-box applies its 4-byte input 
column the composition of 4 parallel S-box invocations, a left multiplication by 
the MixColumn matrix M , a xor with a 32-bit subkey column, and 4 final parallel 
S-box invocations. 

3 Since AddRoundKey is parametrized by a subkey the use of the notation AK, that 
suggests a fixed transformation, is a slight abuse of notation, but this notation is 
convenient in the context of this paper: in the sequel AK just stands for a xor with 
some constant — whose value does not affect the properties we consider. 
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Moving to the Untwisted Representation of 2 Consecutive AES 
Rounds. We now show how to move from the super S-box representation of two 
consecutive rounds to their untwisted representation as the composition S • R of 
four parallel column-wise non-linear transformations and four parallel row-wise 
affine transformations. We first observe that the periodic repetition, in r itera- 
tions, of the 2-round pattern associated with the super S-box representation: 

SR • SuperS B ■ SR • MC • AK 

can be equivalently viewed as the periodic repetition in r iterations of the cycli- 
cally shifted periodic 2-round pattern: 

SuperS B SRMC-AK-SR 

up to a minor correction, namely the left composition of the first iteration with 
SR and the right composition of the last iteration with SR -1 . Now in order 
to move to the aimed 2-round representation the conducting idea is to left and 
right-compose the Super SB and SR • MC • AK • SR transformations using well 
chosen byte permutations P and Q and their inverses P~ l and Q -1 . Due to the 
cancellation effect produced by the alternate use of these permutations and their 
inverse, r iterations of the obtained 2-round description: 

(Q _1 • SuperSB ■ P _1 ) ■ (P ■ SR ■ MC ■ AK ■ SR ■ Q) 

gives, for any choice of the two byte permutations, exactly the same product 
as r iterations of the 2-round transformation it is derived from, up to a left 
composition of the first iteration by Q -1 and a right composition of the last 
iteration by Q. In order for the byte permutations P and Q to provide the desired 
untwisted representation, they must satisfy the two following extra requirements: 

— ( i ) the non-linear transformation S = Q -1 • SuperSB • P _1 must operate on 
columns; 

— (ii) the affine transformation R = P • SR • MC • AK • SR • Q must operate 
on rows. 


In order to describe the byte permutations satisfying the above requirements 
that we found, we introduce the following auxiliary byte permutations: 

— we denote by T the matrix transposition that operates on 4 x 4 matrices of 
bytes as follows: 

/ ao cla ag a 12 

ai as ag ai3 
<22 dQ dio ai4 
\ <23 d7 dll ai5 ) 

— we denote by PC (or SwapColumns) the swapping of the second and fourth 
columns of the input state: 



/ ao a4 as ai2 \ 


/ ao ai 2 CL8 CL4 \ 

ai as ag ai3 

i — y 

ai ai3 <2g as 

d2 dQ aio ai4 

d2 ai4 CLIO CL6 1 

\a3 d7 an <215 / 


\a3 ais an CL7 ) 


SC : 
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Proposition 1. The byte permutations 


P = SR ■ T ■ SR- 1 


a 0 a 4 a 8 a 12 \ 
a i as ag ai3 
<22 a$ a 10 <214 
\a3 a7 an «i5 / 




/ ao as aio ais \ 
a3 a4 ag ai4 
a2 a7 as ai3 
\ai a6 an «i 2 / 


and 

Q = SR- 1 • T-SR-SC 


/ ao a4 as ai2 \ 
ai as ag ai3 
a2 a6 aio ai4 
\a 3 a7 an ai 5 / 


i->> 


^ ao a7 aio <213 N 

ai a 4 an <214 
a2 as as ais 
\a3 a6 ag ai 2 , 


satisfy the requirements (i) and ( ii ) and thus result in the desired untwisted 
representation. 


Proof sketch. 

(i) : It is easy to see that P, Q, and their inverses operate on columns. Therefore 
S = Q _1 • Super SB • P -1 also operates on columns. 

(ii) : We can simplify the expression of R: 

R = P- SR- MC • AK - SR - Q 
= SR • T • PP - 1 • PP • MC • AP • PP • PP- 1 • T • PP • PC 
= PP • T • MC • AK -T - SR - SC 


Since T • MC • T and therefore T • MC • AP • T operates on rows and PP and 
PC also operate on rows, P operates on rows. □ 


The linear part of the row- wise affine transformation P determined by P and 
Q is described by the four following circulant matrices P^, i = 0 to 3. Each 
matrix Ri operates on a Tbyte row vector X{ that represents row i of the input 
block of P and produces the Tbyte row vector yi = Xi • Ri that represents row 
i of the linear part of the image of the input block by P. The coefficients of the 
Ri are those of the MixColumns matrix M (in a different order). 


Po = R‘2 = 


/ 2 3 1 1 \ 
3 112 
112 3 
\1 2 3 1/ 


Pi — P 3 — 


/I 1 2 3\ 
12 3 11 
2 3 11 
\3 1 1 2/ 


M = 


/ 2 3 1 1 \ 
12 3 11 
112 3 
\3 1 1 2/ 


Remark. (P, Q ) is not the unique pair of byte permutations that satisfy require- 
ments (i) and (ii). Given any permutations a and r of the set {0, 1,2,3}, let 
us denote by C a , resp. D r the associated column and row permutations, that 
on input a 4-tuple (xo, aq, £ 2 , ^ 3 ) of columns, resp. of rows produces the per- 
muted 4-tuple (#g, x}, ^ 2 , x' 3 ) of columns, resp. of rows given by x' a ^ = Xi , resp. 
x' T (i) = Xi , i.e. x[ = x a -i ^ resp. x[ = xf 1 ^), i = 0 to 3. It is easy to see that all 
the pairs of byte permutations (P a ,r: Qa,r ) = (C a • D r P,Q- D r - 1 • C a - 1 ) also 
satisfy requirements (i) and (ii). We will however only use (P, Q) in the sequel. 
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Resulting Untwisted Representation of AES 2r + and AES2r* The former 
2-round untwisted representation of two consecutive AES rounds immediately 
results in the following equivalent untwisted description of the 2r-round version 
AES 2r + of the encryption function of AES (in which the MixColumns transfor- 
mation is kept in the last round). 

AES 2r + = AK • IP • (S • R) r • FP, 

where the initial and final permutations IP and FP are the byte permutations 
given by: 

IP = SR-Q =T - SR- SC ; 

FP = Q- 1 ■ SR-l** IP- 1 . 

This representation AES 2r + is illustrated on Figure 2. To confirm the equivalence 
of the above representation of AES 2r + with its usual representation using SB, 
SR, MC, and AK, implementations based on both representations were checked 
to provide equal output values on a few input values. 


AK IP 1111 H nil H 1111 H 1111 H 1111 H FP 

SRSRSRSRSR 


Fig. 2. Equivalent representation of AES 10 +. IP and FP are permutations of the byte 
positions. 

The former representation of AES 2r + can be used to derive a first representation 
of AES 2r , that will be used in the sequel to mount a known- key distinguisher for 
AESg. The right composition of AES 2r with an appropriate conjugate of MC -1 
is required in order to cancel out the MixColumns operation in the last round. If 
one “develops” the last occurrence of R and simplifies the obtained expression, 
one obtains the equality: 

AES 2r = AK ■ IP - (S • R) r ~ 1 -S -P-SR- AK. 

We also introduce a second equivalent representation of AES 2r that will be 
used in the sequel to mount a known- key distinguisher for AESio: we start 
from an equivalent representation of the 2 (r — l)-round version AES 2 ( r _i)+ of 
AES, apply a left composition with a full round and a right composition with a 
last round without MixColumns, and simplify the obtained expression using the 
equality R = P - SR • MC - AK - SR - Q. 

AES 2r = (AK SB -SR- MC) - AES 2(r _ 1)+ • (SB - SR • AK) 

= AK - SB - SR- MC - AK - SR-Q - (S - R) r ~ 1 - Q _1 • SR -1 -SB-SR-AK 
= AK ■ SB - P- 1 - R - (S - Ry- 1 - Q- 1 - SB - AK 
= AK - P- 1 ■ SB - R - (S • R)^ 1 • SB - Q~ x - AK 

Thus AES 2r can be equivalently viewed as a middle transformation R-(S • R) r_1 
preceded and followed by simplified initial and final “external rounds” , namely 
AK - P - 1 - SB and SB - Q~ x - AK. 
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3 The Known-Key Model 

We believe that the untwisted AES representation introduced above can po- 
tentially help analysing known structural attacks of reduced-round versions of 
AES, AES-like ciphers, or AES-based hash functions 0 In the next section we 
will present two “attacks” that substantiate this belief. They both happen to 
belong to a quite specific class of structural attacks, the so-called known- key 
distinguishers, and respectively relate to a reduced-round version of AES and 
the full 10-round AES- 128. In this section we introduce the underlying security 
model, that is named the known-key model. This model was inspired from the 
cryptanalysis of hash functions and first introduced by Knudsen and Rijmen in 
m- The difference between the known-key model and the usual security model 
considered for block ciphers can be outlined as follows. 

— In the usual model, the adversary is given a black box (oracle) access to an 
instance of the encryption function associated with a random secret key and 
its inverse and must find the key or more generally efficiently distinguish the 
encryption function from a perfect random permutation; 

— In the known- key model, the adversary is given a white box ( i.e . full) access 
to an instance of the encryption function associated with a known random 
key and its inverse and her purpose is to simultaneously control the inputs 
and the outputs of the primitive, i.e. to achieve input-output correlations 
she could not efficiently achieve with the inputs and outputs of a perfect 
random permutation to which she would have an oracle access. 

We now propose a more detailed definition of the known- key model - i.e. of the 
adversaries considered in this model, that are named known-key distinguishers. In 
order to capture the idea that the goal of such adversaries is to derive an TV-tuple 
of input blocks of the considered block cipher E that is “abnormally correlated” 
with the corresponding TV-tuple of output blocks, we first introduce the notion of 
T-intractable relation on TV-tuples of E blocks. This notion (that is independent 
of E up to the fact that for the sake of simplicity we are using the time complexity 
of E as the unit for quantifying time complexities) is closely related to the notion 
of correlation intractable relation proposed in 0 . It essentially expresses that it 
is difficult to derive from oracle queries to a random permutation and its inverse 
an TV-tuple of input/output pairs satisfying the relation. 

Definition 1 (T-Intractable Relation). Let E : (K, X) E {0, l} k x {0, l} n 

Ek{X) E {0, l} n denote a block cipher of block size n bits. Let TV > 1 and 1 Z 
denote an integer and any relation 0 over the set S of TV -tuples of n-bit blocks. 1Z 

4 By structural attacks we mean here attacks that unlike statistical attacks, e.g. dif- 
ferential and linear cryptanalysis, do not consider the detail of the algorithm’s el- 
ementary ingredients such as the S-boxes, but put more emphasis on their overall 
construction, their use of transformations that preserve the byte structure or the 
32-bit structure of the data, etc. 

5 Let us remind that for any set S, a relation 7Z over S can be defined as a subset of 
the cartesian product S x S and that for any pair (a, b) of S X S, alZb means that 
(a, b) belongs to this subset. 
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is said to be T -intractable relatively to E if, given any algorithm A' that is given 
an oracle access to a perfect random permutation II of { 0, l} n and its inverse, 
it is impossible for A' to construct in time T' < T two N -tuples X' = (X-) 
and y = (Y/) such that Yf = II (X-), i = 1 • • • N and X' IZ y f with a success 
probability p' > \ over II and the random choices of A' . The computing time 
V of A' is measured as an equivalent number of computations of E, with the 
convention that the time needed for one oracle query to U or TI ~ 1 is equal to 
l.Thus if q' denotes the number of queries of A' to U or TI~ X , q' < T' . 

Definition 2 (Known-Key Distinguisher). Let E : (K,X) G { 0 , l} fc x 
{ 0 , l} n i-G Ek(X) G { 0 , l} n denote a block cipher of block size n bits. A known- 
key distinguisher ( IZ , A) of order N > 1 consists of (1) a relation IZ over the 
N -tuples of n-bit blocks (2) an algorithm A that on input a k-bit key K pro- 
duces in time T 4, i.e. in time equivalent with T 4 computations of E, an N -tuple 
Af = ( Xi)i=i...]\r of plaintext blocks and an N -tuple y = (Yi)i=i...N of ciphertext 
blocks related by Y{ = Ek(Xi), The two following conditions must be met: 

( i ) The relation IZ must be T ^-intractable relatively to E . 

(ii) The validity ofIZ must be efficiently checkable: we formalize this requirement 
by incorporating the time for checking whether two N -tuples are related by IZ in 
the computing time T4 of algorithm aE 

It is important for the sequel to notice that in the former definition, while the 
algorithm A takes a random key K as input, the relation IZ satisfied by the 
7V-tuples of input and output blocks constructed by A is the same for all values 
of K and must be efficiently checkable without knowing K. 

Example 1 . The following example of a known- key distinguisher of order N — 2 il- 
lustrates the link between the use of block ciphers for hashing purposes and their 
security in the known-key model. Let E denote a block cipher of key length k bits 
and block length n bits and (X±,X 2 ) and (Yi, Y 2 ) denote two pairs of n-bit blocks. 
We define the relation (Xi, X 2 ) IZ (Yi, Y 2 ) by the conditions X± ^ X 2 and X± ® Y\ — 
X 2 0 Y 2 . The definition of relation IZ obviously implies that if E is vulnerable to a 
known- key distinguisher ( IZ , A) of complexity T C 22 , then the compression function 
h : {0, l} fc x {0,1}™ ->• {0,1}™ : ( K,X ) ^ X © E K (X) derived from E using the 
Matyas-Meyer-Oseas construction is vulnerable to a collision attack of complexity T 
that is more powerful than any generic collision attack against /iQ 

In the next example and throughout the rest of this paper, we are using the fol- 
lowing notation to describe integral properties of partial AES encryptions and 
decryptions. 

Notation. Let F : { 0 , l} n -T { 0 , l} n denote any mapping over the block space 
and let us consider the transformation by F of a structure X of N = 2 8m blocks, 

6 This avoids specifying an explicit upper bound on the time complexity for checking 
whether two 7V-tuples are related by IZ. In practice one typically expects the time 
complexity for checking IZ to be at most the one of N computations of E. 

7 It could be shown that if T <C 2 2 , IZ is T-intractable. 
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m < 16. An input or output byte bi, i £ { 0, • • • , 15} of F is said to be constant 
and marked C if it takes one constant value. It is said to be uniform and marked 
U if it takes each of the 2 8 possible values exactly 2 8 ( m_1 ) times. A s-tuple 
(5q, • • • , bi g ), where s < m and ii, • • • i s G {0, • • • , 15} , of input or output bytes 
of F is said to be uniform and marked • • • U s if (6q, • • • ,bi g ) takes each of 
the 2 8s possible s-tuple values exactly 2 8 ( m-s ) times. 

Example 2. The known-key distinguisher for AES 7 introduced in m uses a 
relation IZ of order TV = 2 56 that exploits integral properties of partial AES 
encryptions and decryptions. The following integral properties are used: 
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where 4 r denotes 4 consecutive AES encryption rounds without MixColumns in 
the last round and —3 r denotes 3 full AES decryption rounds. These properties 
imply that if a middle structure Z of TV = 2 56 blocks is chosen as to satisfy the 
properties of the intermediate block of the scheme below, then by applying 4 
forward encryption rounds and 3 backward decryption rounds to this structure 
one obtains a TV-tuple of (plaintext, ciphertext) pairs that satisfy the relation IZ 
that (1) the TV input blocks are pairwise distinct and (2) each of the 16 input 

bytes and each of the 16 output bytes is uniformly distributed. 
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While IZ could be shown to be TV-intractable by the same kind of arguments 
as those used in the next section, we do not give a detailed proof here. The 
authors of m do not use exactly the same notion of T-intractable relation, 
but conjecture the related - somewhat stronger - property that “for a randomly 
chosen 128-bit permutation, finding a collection of 2 56 texts in similar time, using 
similar (little) memory and with similar properties as in the case of 7-round AES 
has a probability of succeeding which is very close to zero” . 


Example 3. In [12] a known-key distinguisher of order TV = 2 for AESg of time 
complexity T = 2 48 , memory about 2 32 , and success probability close to 1 is 
described. The associated relation IZ is differential in nature. It is defined as 
follows: (Ah, X 2 )IZ(Yi, Y 2 ) if and only if X\ 7^ X 2 , the single non-zero bytes of 
the input difference X\ ® X 2 are the diagonal bytes, i.e. the bytes numbered 
0, 5, 10, and 15, and the single non-zero bytes of the output difference Y\ ® Y2 
are the four bytes numbered 0, 7, 10, and 13. It was shown in m that given a 
perfect random permutation 77, the best method to get an input pair (Ai, X 2) 
and an output pair (Yi, Y 2 ) = (77 (Ah), II (X 2 )) satisfying (Ah, X 2 )IZ(Yi, Y 2 ) is 
the so-called limited birthday technique, that requires about 2 65 oracle queries 
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for a target success probability of about \. With only T = 2 48 oracle queries, 
the success probability of this best method would decrease to about 2~ 17 . 

Example 4- When applied to block ciphers, so-called zero-sum distinguishers 
QE0, that thanks to higher order differential properties produce structures 
(W, Yi)i= 1 ... 7 V of N (input, output) pairs such that © i=1 X{ — © i=al 1* = 0 also 
represent examples of known- key distinguishers. 

Impossiblity Results on the Resistance of Block Ciphers to all Known- 
Key Distinguishers. Specifying the requirements on the resistance of a block 
cipher E against known-key distinguishers is a notoriously difficult issue because 
of an impossibility result that was first pointed out by Canetti, Goldreich, and 
Halevi in [6] . While the notion of correlation intractability was originally used to 
state this result, the related notion of resistance against known- key distinguishers 
can be used to reformulate it as follows: 

Proposition 2. Every block cipher of key length k bits and block length n bits 
such that k < n is vulnerable to a known-key distinguisher of order 1 and com- 
plexity about one computation of E. 

Proof sketch. In order to give the intuition of the proof, let us restrict ourselves 
to the situation where k = n. It suffices to use the whole specification of E in 
the definition of R to get the claimed result. Let us define X 7 Z Y, where X 
and Y are any n-bit blocks, by the condition Y = Ex(X). Given any known 
fc-bit key K, the easy to compute values X — K and Y = Ek(K) are related 
by Ek and satisfy X 7ZY. However, for any adversary A! that makes q « 2 n 
queries to a perfect random permutation 77 of the block space, finding X such 
that X 7 Z II (X ), i.e. II (X) = Ex(X) is very unlikely to succeed: by sepa- 
rately considering the cases where A' outputs a value X that belongs or does 
not belong to a queried pair it can indeed be shown that the success probabil- 
ity of A' is upper bounded by ^ + 2r f _ q , and is therefore negligible if q « 2 n . □ 

The former proposition can be easily extended as follows. 

Proposition 3. Every block cipher of block length n bits and key length k = Nn 
bits is vulnerable to a known-key distinguisher of order N and complexity about 
N computations ofE | 

Proof sketch. We just need to replace the relation 7Z used in the former proof by 
the following relation TZx over the TV-tuples of blocks: if X = (Xf)i = and 
y = (Yi)i~i...jsr, XTZnY iff Vi G [1; N]Ex{Xi) = Yi, where Ex denotes the block 
cipher E parametrized by the TVn-bit key W||W|| • ■ • ||Wv- □ 


8 One can generalize the former result a bit further by noticing that if k < Nn, then 
given any easy to compute and easy to invert function / : {0, 1}^™ — {0, l} fc , a 
simple variant of the known-key distinguisher of Proposition 3 can be obtained by 
replacing 7 Zn by the relation 7 Z' n defined by XI Z' nY iff Vi G [1; N] Ef(x){Xi) — Yi. 
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To summarize the above impossibility results, for a block cipher E of block and 
key lengths n and k , generic known- key distinguishers of order N are known to 
exist iff k < Nn. 

Discussion. If k > Nn , any known-key distinguisher of order at most N can 
be reasonably conjectured to be meaningful , i.e. to reflect, unlike the artificial 
generic known- key distinguishers of Propositions 2 and 3, a meaningful corre- 
lation property of E. Now in the frequently encountered case where k < Nn, 
that is met for instance for the known-key distinguisher of m where km 128 
and Nn = 2 56 x 128, characterizing which known- key distinguishers of order N 
should be considered meaningful and which ones should be considered artificial 
is a very complex issue. Finding a complete characterization remains an open 
problem that even lacks a rigorous statement and we will not attempt to solve 
it here. We will limit ourselves to propose informal criteria allowing to identify 
two classes of known key distinguishers that have little to do with artificial dis- 
tinguishers identified so far and can be both reasonably considered meaningful. 

- Informal criterion 1. One heuristic argument in favour of the view that the 
known-key distinguisher of Example 2 nn for AES7 is meaningful is the ob- 
servation that while the description of the generic relations IZ and IZn used in 
Propositions 2 and 3 involve the specification of E itself, the relation IZ used in 
m has no obvious connection with the specification of E. More generally, if a 
known-key distinguisher uses an intractable relation IZ whose specification does 
not extensively reuse operations of E, this provides some heuristic evidence that 
it can be considered meaningfully 

- Informal criterion 2. While the informal criterion 1 sounds like a reasonable 
sufficient condition, we think it should not be considered as a necessary condi- 
tion. In other words, known- key distinguishers that do not satisfy it, i.e. whose 
relation IZ re-uses some operations of E , should not be systematically ruled out 
as if they were all artificial. We informally state an alternative criterion for high- 
lighting that independently of whether their relation IZ reuses operations of E 
or not, some known- key distinguishers have little to do with existing artificial 
distinguishers. One can observe that in the artificial distinguishers (A, IZ) of 
Propositions 2 and 3 and of the generalisation of Proposition 3 in the remark 
above, algorithm A produces an TV-tuple X of input blocks from which the value 
of the whole key can be easily derived: in other words, one exploits the fact that 
X “encodes” the value of the entire key. If for a given known-key distinguisher 
(A, IZ) the entire key can neither be derived from the 7V-tuples of input values 
X nor from the 7V-tuples of output values y produced by A one is brought back 
to a situation somewhat similar to the case where k > Nn (a condition that ob- 
viously prevents X and y from encoding the entire key) and this provides some 
evidence that (A, IZ) has little to do with the artificial distinguishers identified 
so far. We will use this informal criterion at the end of the next section. 


9 Giving a rigorous definition of the former informal criterion seems difficult. One 
might perhaps express that the verification of IZ is not substantially sped up by 
oracle accesses to E. 
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4 Application: Improved Known-Key Distinguishers for 
AES 8 and AES^o 

4.1 A Known-Key Distinguisher for AES 8 

Let us now show how to use the first untwisted representation of AES 2 r in- 
troduced in Section 2 in order to mount a known-key distinguisher of order 
N = 2 64 for AESs- The distinguisher starts from a suitably chosen middle N- 
block structure and exploits the forward and backward properties of the final 
rounds, resp. the initial rounds of the AESs, that are illustrated on Figure 3. 
These properties result from the fact that the initial and final rounds essentially 
consist of the composition S -R-S, up to simple initial and final transformations. 

Property 1. For any structure X( a ,b,c,d) = {{ x ® a, b, c, d), x G {0, l} 32 } of 2 32 
input blocks — where (a, 6, c, d) denotes an AES block of columns a , b, c, and 
d — each of the four f-byte columns of the image of X( a ,b, c ,d) by S • R • S is 
uniformly distributed. 

This can be easily seen by following the column- wise transitions through trans- 
formations S , R , and S on the top of Figure 3 and by observing (1) that S 
transforms each column bijectively and (2) that if one fixes the second, third, 
and fourth input columns of R , each of the four output columns of R is a bijec- 
tive affine function of the first input column. Since moreover P • SR • AK is just 
a permutation of the byte positions followed by a key addition, each of the 16 
bytes of the image of ^ a ,b,c,d) by S • R • S • P • SR • AK is uniformly distributed 
and can be marked “t/” . 

Property 2. For any structure y( e j, g ,h ) = {(y © e, /, g, h), y € {0, l} 32 } of 2 32 
blocks , each four-byte column of the preimage of y( e j, g ,h) by S-R-S is uniformly 
distributed. 

This can be easily seen by following the column- wise transitions through trans- 
formations S -1 , I? -1 , and S' -1 on the bottom of Figure 3 and observing (1) 
that S' -1 transforms each column bijectively and that (2) if one fixes the second, 
third, and fourth input columns of I? -1 , each of the four output columns of R~ l 
is a bijective affine function of the first input column. Since moreover IP~ l is 
a permutation of the byte positions, each of the 16 bytes of the preimage of 
y(ej,g,h ) by AK • IP • S • R • S is uniformly distributed and can be marked “U v . 
We are using these properties to mount the known-key distinguisher of order 
N = 2 64 for AES8 illustrated on Figure 4, i.e. an algorithm A allowing to effi- 
ciently derive from any known key a TV-tuple ((Xi,Tf)) i _ 1 ^ of AESs (input, 
output) pairs that satisfy the relation 7 Z defined as follows. 
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Fig. 3. Forward and backward properties of S • R • S 


Relation IZ: ( Xi)i = i...N'Tl(Yi)i=i---N iff the N blocks X{ are pairwise distinct 
and for each byte position j G {0, • • • , 15} ; the j-th byte of the X{ and the j-th 
byte of the Yi are uniformly distributed. 

Algorithm A: The conducting idea is that in the untwisted representation of 
AESg in Figure 4, the initial and final rounds of Figure 3 are linked together by 
the transformation R, that is affine. This allows to construct a structure that 
simultaneously achieves the requirements on the intput and the output of R~ x 
in order to apply Properties 1 and 2. More in detail, we are using the 2 64 chosen 
middle blocks structure Z — X o ® RAo, where T 0 and Ao are shorthands for 
^(o,o, o,o) an d 3^(o, o,o,o) an d T 0 © RXo denotes the set {X 0 R(Y),X G T 0 ,T E 
To}- It directly results from the definition of Z that it can be partitioned into 
2 32 structures T 0 © R(y , 0, 0, 0) = T^ q^o) of 2 32 blocks each, one for each 
value y G {0, l} 32 . In other words, Z can be partitioned into 2 32 structures of 
the form T( a Therefore, due to Property 1, each byte of the image of Z by 
S • R- S • P • SR- AK satisfies property U. Let us denote by L and C the linear and 
constant parts of the affine mapping R, i. e. the linear mapping and the constant 
such that MX E {0, 1 } 128 R(X) = L(X) 0 C. Since the linear mapping and the 
constant associated with R -1 are L' = L~ x and C' = L _1 (C), the preimage of 
Z by R is R- X {Z) = o 0 T(To) ®d)®C' = L^ l (X 0 ) 0 To- Therefore 

R _1 (Z) can be partitioned into 2 32 structures To 0T -1 (x, 0, 0, 0) = ^-©^, 0 , 0 , 0 ) 
of 2 32 blocks eacli^l - one for each value x G {0, l} 32 . In other words, R 1 (Z) 
can be partitioned into 2 32 structures of the form T( e ,/,^,/i) an d the application 
of Property 2 to R _1 {Z) shows that each byte of the preimage of R~ X (Z) by 
AK • IP • S • R- S', i.e. each byte of the preimage of Z by AK • IP • S • R • S • R, 
satisfies property U. 

In summary, we derived from the middle structure Z a A- tuple ( (Xi , Yi ) ) ^ 
of AESg (input, output) pairs that satisfy relation IZ. The time complexity of the 
derivation of such an TV-tuple is T = N = 2 64 AESg computations. To complete 
the proof that we have mounted a known- key distinguisher for AESg, we just have 
to show that property IZ is T-intractable, i.e. that the success probability of any 


10 One can notice that the above partitions of Z and R 1 (Z) do not map into each 
other through R. 
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Fig. 4. A known-key distinguisher for AESs 


oracle algorithm A^ n,n ^ of overall time complexity upper bounded by N (and 
therefore of number q of queries also upper bounded by N) is negligible. 

Proposition 4. For any oracle algorithm A that makes q < N = 2 64 oracle 
queries to a perfect random permutation 77 of {0, l} n (where n = 128 ^ or its 
inverse, the probability that A successfully outputs a N -tuple {[Xi,Yi)) i _ 1 N 
of (input, output) pairs of II that satisfy 1Z is upper bounded by 2 n-(N-i) an ^ 
hence by • 

Proof If at least one of the N pairs (Xi, Y]) output by A does not result from 
the query X{ to 77 or the query Yi to 77“ 1 , then the probability that for this pair 
Yi = 77 and thus the success probability of A is upper bounded by 2n _^ N _ 1 ^ ) . 
In the opposite case, i.e. if q m N and all the ( ) result from queries to 77 
or T7 -1 , we can assume w.l.o.g. that (X/v, Y/v) results from the N - th query X/v 
or Yn of A to 77 or T7 -1 . But given any pairs (X^, at most one value 

of the block Yn, resp. X^v is such that each of the 16 bytes of (3b)i=i---iv 5 resp. 
(Xi)i=i...N be uniformly distributed F*1 However the oracle answer Y/v, resp. Xn 
is uniformly drawn from {0, l} n \ {Yi, • • • Yn- 1 }, resp. {0, l} n \ {Xi, • • • Xn- 1 }. 
Therefore the probability that the answer to the X-th query allows the output 
of A to satisfy property 1Z is also upper bounded by 2n _^ N _^ in this case. □ 

Discussion. The known-key distinguisher of order N = 2 64 for AESs presented 
above has a time complexity of about 2 64 . It is obviously applicable without 
modification to the AESg variant parametrized by independent subkeys AESg. 
In both cases, the fact that informal criterion of Section 3 is met, i.e. that 
the relation 7 Z used by the distinguisher has no obvious connection with the 
AES specification suggests that the obtained known-key distinguisher can be 
considered meaningful. While the presented 8-round known- key distinguisher is 
outperformed by the differential known-key distinguishers for AESg of complex- 
ities 2 48 and 2 44 of [T2H4] , the strong property expressed by relation 7 Z that 


11 


This can for instance be deduced from the fact that the Xi and the Yi must satisfy 

©r =1 ^=©r =1 ^=o. 
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each input and output byte is not only balanced as in zero-sum distinguishes, 
but uniformly distributed turns out to be convenient for further extending the 
known key distinguisher by two rounds in a provable manner, as will be shown 
in the rest of this section. 


Strengthening Proposition 4 Under a Heuristic Assumption. Let us give 
some partial evidence that 7 Z is actually T-intractable in a stronger sense than in 
Proposition 4 above, namely that the success probability of any adversary A who 
makes M > N oracle queries to 77 or 77 _1 remains negligible if M — TV is not too 
large. While a rigorous proof requiring no unproven assumptions could be easily 
derived along the same lines as Proposition 4 for values of M marginally larger 
than A, e.g. TV + 3, for larger values of M we make the heuristic assumption 
that querying both 77 and 77 “ 1 does not improve the performance of A over 
an adversary who only queries one of these oracles. Therefore, we consider an 
adversary A who only makes queries to an oracle permutation 77 not its inverse, 
and aims at finding an TV-tuple of (input, output) pairs that satisfy the relation 
1Z of Section 4.1. To upper bound the success probability of such an adversary, 
we observe that given any TV-tuple of distinct input blocks Xi and any output 
byte position j G [ 0 ; 15], the 256-tuple (TVo, • • • , TV 255 ) of numbers of occurrences 
of the values 0, 1, • • • 255 for byte j of the blocks Y{ — 77 (Xi) is nearly governed 
by a multinomial law. For any 256-tuple (TVo, • * * , TV 255 ) such that = 

we denote the multinomial coefficient ^,^'.^1 by ( Not . N N255 ) ■ 


Proposition 5. For any TV -tuple (W)z=i---at of distinct inputs to 77 an upper 
bound on the probability p that for byte positions j = 0 to 15, the 256-tuple of 
numbers of occurrence of the values of byte j of 77 (Xi) be (TVq, • • • , TV| 55 ) — 
where for j=0 to 15 the 256-tuple (TVq , • • • , N% 55 ) satisfies J]q 55 = ^ — * s 

given by: 

p - n Ut • • • NiJ x ( 2 ^-iv+i )iV - 

An upper bound on the success probability pa of an adversary A who makes 
M > TV queries to 77 and no query to U ~ 1 is given by: 
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Since TV = 2 , Proposition 5 provides very small upper bounds pa 4 
values of M of up to M ~ TV + 2 11 . But it provides no bound pa < \ 
slightly larger values, e.g. M « TV + 2 12 . We do not know whether the bounds 
of Proposition 5, that relate to the probability that the (input, output) pairs 
provided by M queries contain one TV-tuple, can be significantly improved. Since 
even in a situation where such TV-tuples exist it can be computationally difficult 
to find one in time T, a potential approach might consist in establishing upper 
bounds that hold for higer values of M under computational assumptions. 
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4.2 A Known-Key Distinguisher for the 10-Round AES 

In this section we show that the former known-key distinguisher for AESs can 
be extended by two rounds without significant complexity increase. The price to 
pay for this extension is that the relation 1Z of the new distinguisher is much less 
simple and that its description involves operations of the first and last rounds. 
This raises the question whether the new known-key distinguisher reflects a 
meaningful correlation property of the cipher. Since we can provide more simple 
arguments supporting this view for AES* 0 ( i.e . the 10-round AES parametrized 
by 11 independent subkeys), we first describe the application of the new known- 
key distinguisher to AES { 0 and then discuss how this transposes to AES- 128. 

As shown at the end of Section 2, AES^q can be equivalently represented by 
the sequence of transformations 

AK ■ P- 1 ■ SB ■ R - (S ■ R) 4 ■ SB • Q ~ 1 • AK 

The properties we are using to build a known- key distinguisher on AESJ 0 are 
illustrated on Figure 5. 
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Fig. 5. Derivation of the N AESio (input, output) pairs used in our known- key distin- 
guisher 


Algorithm A: We reuse the same structure Z of N = 2 64 intermediate blocks 
as for the known-key distinguisher on AESg presented above, but extend the for- 
ward computation and backward computations S • P • S and (S • R • S • P) _1 , by 
two outer transformations whose structures are symmetric of each other, namely 
(AK • P _1 • SB • P) _1 (backward) and R • SB • Q~ x • AK (forward) to get an 
TV-tuple of related AES^ inputs and outputs. As shown in the former subsection 
the inputs to the forward and backward outer transformations each consist of 
four columns that are uniformly distributed and therefore each of the 16 bytes 
of each of these two states U. and V is uniformly distributed and can be marked 
U. However, these states are related to the AES^ 0 inputs X{ and to the AESJ 0 
outputs Yi by the outer transformations. 

This implies that if we denote by a and (3 the 128-bit states P _1 (Po) and 
Q(K io) the TV-tuple A = (Xi)i= and y = (l^)i=i,.,w are related by the key- 
dependent relation 1Z a ,/3 defined as follows: XlZ a ^y if and only if each byte of 
Po SB(P~ 1 (X i ) 0 a) and each byte of P _1 o PP _1 (Q(Ti) ® /3) is uniformly dis- 
tributed. We can now define the following relation 1Z over the TV-tuples of blocks: 

Relation 1Z: Given two N -tuples X' = (X-) i= i...N and y' = (Y') i= i...n X'lZy' 
if and only if all the X[, i 1 • • • TV are pairwise distinct and there exists a pair 
a' , /3' of 128-bit states such that X'7Z a t . 
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It is important to understand that though relation 7 Z reflects the existence of 
values a' and (3' that can be conveniently interpreted as subkeys, ckecking 7 Z 
does not take as input any key or subkey: given two 7V-tuples X' and y' that 
can be possibly derived from a random key value K by algorithm A, whether 
X'ny must be efficiently checkable without providing the verifyer with K or 
any other side information about suitable values of a' and (3 ' . 

It immediately results from the definition of 7 Z that the 7V-tuples A and y 
derived as described in Figure 5 satisfy property 7 Z and the complexity of the 
derivation algorithm A is T = N = 2 64 . To complete the proof that (7£, A) is 
a known- key distinguisher for AESJ 05 we just have to show that 7 Z is efficiently 
checkable and T-intractable. 

7 Z is Efficiently Checkable. Though the involvement in 7 Z of 128-bit constants 
a' and f3' might suggest that checking 7 Z has a huge complexity, this is not the 
case because the existence of 128-bit states a' , /?' such that X'l ^>y' can be 
split into independent conditions. Let us denote by sb : {0, l} 32 — ^ {0, l} 32 a 
parallel application of four AES S-boxes that from a four-byte row produces a 
four-byte output row. For j= 0 to 3 let us denote by rowj the mapping that 
from a 128-bit state outputs the row numbered j of this state, and by Rj the 
linear transformation of row j introduced in Section 2. It is easy to see that the 
existence of a' and (3' is equivalent to the existence of eight 32-bit constants 
a 'j: j = 0---3 and /?'■ , j = 0---3 (representing the rows of a' and (3 ') such 
that for j = 0 • • • 3 each of the four bytes of Rj o sb o rowj^P^^Xi) 0 a'-) and 
RJ 1 o sb~ x o rowj(Q(Yi) 0 a'j) is uniformly distributed. This can be easily done 
by first computing in a first step the number of occurrences of each of the 2 32 
possible values of the 32-bit words rowj(P~ 1 (Xi )) and rowj(Q(Yi )), j = 0 • • • 3, 
and then using the obtained distributions of frequencies in a second step for 
computing, for j = 0 to 3 and each of the 2 32 possible values of <a'-, resp. (3'j 
the resulting distribution of frequencies of Rj o sb o roWj(P~ 1 (Xi) 0 a'-), resp 
RJ 1 o sb -1 o rowj(Q(Yi) 0 a'j) and checking that at least one of them induces a 
balanced distribution for each byte position. Since the first step requires 2 64 very 
simple operations that are much less complex that one operation of AES| 0 and 
the second step again requires 8 times 2 64 very simple operations, the overall 
complexity of checking 7 Z is strictly smaller than N = 2 64 AESJ 0 operations. 

Remark. The reader might wonder whether the technique we used to derive a 
known- key distinsguisher for the 10-round AES from a known- key distinguisher 
for the 8-round AES, by expressing that the 10-round inputs and outputs are 
related (by one outer round at each side) to intermediate blocks that satisfy the 
relation used by the 8-round distinguisher does not allow to extend this 8-round 
known distinguisher by an arbitrary number of rounds. If this was the case, this 
would of course render this technique highly suspicious. It is easy however to see 
that the argument showing that 10-round relation 7 Z is efficiently checkable does 
not transpose for showing that the relations over r > 10 rounds one could derive 
from the 8-round relation by expressing that the r-round inputs and outputs are 
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related by r — 8 > 2 outer rounds to intermediate blocks that satisfy the 8-round 
relation are efficiently checkable. To complete this remark, we explain at the end 
of this section why the 2-round extension technique we used is not generically 
applicable to extend any r-round known-key distinguisher to a r + 2-round dis- 
tinguisher. 

7 Z is T-Intractable. In order to show that relation 1Z is T-intractable, we now 
have to prove that the success probability of any oracle algorithm of overall time 
complexity upper bounded by N = 2 64 (and therefore of number q of queries 
also upper bounded by N) is negligible. 

Proposition 6. For any oracle algorithm A that makes q < N = 2 64 oracle 
queries to a perfect random permutation 77 of {0, l} 128 or 77 _1 ; the probability 
that A outputs a N -tuple (Xi,Yi)i = i...jsr of 77 that satisfies and\/i G [1; TV] Yi = 
n(Xi) and also satisfies 1Z is upper bounded by 2 256 x ( 2 128 -(n- 5 ) ) 3 ~ 2 -16-5 . 

Proof. If at least one of the N pairs (X^,y) output by A does not result from 
a query X{ to 77 or a query Y{ to 77“ x , then the probability that for this pair 
Y{ = n(Xi) and consequently the success probability of A is upper bounded by 
2 • S° from now on we only consider the opposite case, i.e. q = N and 
all the (Xi, Yi) result from queries to 77 or 77 _1 . Given any two 128-bit words a 
and /3, let us upper bound the probability that A outputs an TV-tuple (Ah, Yi) 
that satisfies Vi G [1; TV] Y% = 77 (Ah) and the relation 7Z a The conducting 
idea is that the constraints on the very last queries to the oracle (77, 77 _1 ) in 
order for 7Z a ^ to hold are so strong this is extremely unlikely to happen. For 
the sake of simplicity of this proof, we consider the consider the last 5 queries 
of A to the oracle (77, 77 _1 ): indeed, while considering the d last queries, d > 5, 
might have lead to a tighter upper bound, the chosen value of 5 is sufficient for 
establishing a suitable upper bound. Since the 5 last queries contain at least 3 
queries to either 77 or 77 _1 we can assume w.l.o.g. that they contain at least 
3 queries X, X', and X" to 77 and we denote the corresponding responses 
by T, y', and Y". In order for the property 7 Z a ^ to be satisfied, for each 
byte position j G [0; 15] , the set of byte values Bj = {b G [0; 255] | jj {i £ 
[1 ; TV — 5] | R~ 1 oSB~ 1 (Q(Yi )®/7)[j] = b} ^ must contain at most 5 elements 
(since the last 5 queries can affect the number of occurrences of at most 5 of 
the 256 byte values and all the unaffected numbers of occurrences must already 
be 2^3 ). Furthermore, in order for property 1Z a ,(3 to be satisfied, one must have 
Vi G [TV -4; TV] 77 _1 o SB~ 1 (Q(Y i ) © f3)[j\ G Bj , i.e. Vi G [X - 4; X] Y { G 5 = 
Q~ 1 oSBoR(YJj^ 0 Bj)(Bf3. Since Q , SB , 77, and the xor with /? are bijective, the 
set S defined above contains #<S = # n]=o Bj elements (where n]=o Bj denotes 
the Cartesian product of the Bj). Since for j= 0 to 15 $Bj < 5, (J n^=o Bj — ^ 16 
and hence j \S < 5 16 . Therefore the probability that the three blocks Y, Y', and 
Y" all belong to S is upper bounded by ( 2128^^-5) ) 3 - By summing the obtained 
upper bound over all the 2 256 possible values of cq /3, one gets the claimed upper 
bound 2 256 x ( 2 12 ^-(n- 5 ) ) 3 ~ 2 -16,5 on the probability that 1Z be satisfied. □ 
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In order to give partial evidence that 7 Z is not only TV-intractable as shown 
in Proposition 6 above, but remains T17- intract able for M > TV if M — TV is not 
too large, we can make the heuristic assumption that the success probabilities 
of adversaries who are allowed to make oracle queries to both 77 and 77“ 1 and 
adversaries who are allowed to make oracle queries to 77 only have the same 
upper limit. Proposition 5 can be transposed to the 10-round relation 7£, up to 
a multiplication of the upper bounds obtained for p and pa by 2 256 . This multi- 
plicative factor does not strongly affect the values of M — TV one can reach and one 
still gets very small upper bounds pa <C \ for values of M of up to M « TV + 2 11 . 

The Former 2-Round Extension Technique is Not Generic. The reader 
might wonder why the two-round extension technique introduced above does not 
allow to extend any r-round known- key distinguisher to an r+ 2-round known- key 
distinguisher. There are two reasons that can make such an extension fail: firstly, 
unlike the r-round relation it is derived from, the r + 2-round relation may not 
be efficiently ckeckable; secondly, unlike the r-round relation it is derived from, 
the r + 2-round relation may be insufficiently intractable to mount ar + 2-round 
distinguisher. This second situation occurs in the case of the 8-round differential 
relation IZg of order 2 used in [12]. In the full version of this paper we show 
that unlike IZg , that is T-intractable for T = 2 48 , the 10-round relation IZio 
derived from IZg is not intractable at all for T = 2 48 , but simple to achieve with 
a probability about 0.97 with only two queries to a perfect random permutation 
77 and no extra operation. In other words, the transposition of our technique 
to the 8-round distinguisher of [12] does not allow to derive a valid 10-round 
distinguisher. 

In the full version of this paper, we also show that while we do not preclude 
that the use of the stronger property (reflected by a higher-order relation than 
U 8 ) that several pairs satisfying the differential relation of [12] can be derived 
might potentially result in a 10-round distinguisher that outperforms the 10- 
round distinguisher presented above, giving a rigorous proof (as was done in 
Proposition 6) seems technically difficult. We leave the investigation of improved 
10-round known- key distinguishes and associated proofs - or even plausible 
heuristic arguments if rigorous proofs turn out to be too difficult to obtain - as 
an open issue. 


Discussion. The known-key distinguisher (7£, A) of order TV = 2 64 for AESf 0 
presented above has a time complexity of about 2 64 . Unlike in the former 8- 
round known- key distinguishes the relation 7 Z involves operations of the AES. 
However, it is easy to show that the alternative criterion at the end of Section 
3 for differentiating certain known-key distinguishes from the artificial known- 
key distinguishes that result from generic impossibility results is applicable. 
Indeed, the derivation by A of the input TV-tuple (W)z=i---;v from the interme- 
diate structure Z only involves the 6 first subkeys TCo to and the derivation 
A of the output TV-tuple {Yi)i= 1---N from the same structure only involves the 
5 last subkeys Kq to K\\. Consequently the 5 last subkeys cannot be derived 
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from and thus the input TV-tuples do not “encode” the entire key. 

Similarly, the 6 first subkeys cannot be derived from and thus the 

output TV-tuples do not “encode” the entire key. This suggests that the obtained 
known-key distinguisher for AES* 0 can reasonably be considered meaningful. 

While the former known-key distinguisher is obviously applicable without any 
modification to AESio, T.e. the full AES- 128, the former argument vanishes in 
this case because all subkeys are related by the key schedule: the first subkey, 
resp. the last subkey can actually be derived from the input, resp. the output 
TV-tuple and because of the key schedule relations this determines the entire key. 
This does not mean that when applied to AESio the former distinguisher be- 
comes artificial. Actually, the fact that the very same distinguisher is applicable 
to AESJq gives a hint that it can still be considered meaningful F^l 

5 Conclusion 

As said before, the untwisted representation of AES introduced in this paper is 
not exclusively intended for the analysis of the security of AES in the known-key 
model. We think however that the fact that this represention was used to find the 
two known- key distinguishes presented in Section 4 provides some evidence that 
this representation is well suited for analysing the resistance of (a reduced-round 
version of) AES against some structural attacks. 

Whether there exists a more simple 10-round known- key or even chosen- key 
distinguisher for AES than the 10-round known key distinguisher presented in 
this paper - allowing to highlight a less tenuous deviation from the behaviour 
of a perfect random permutation, resp. of an ideal cipher remains an interesting 
open question. 
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Abstract. In 2013, Standaert et al. proposed the notion of simulat- 
able leakage to connect theoretical leakage resilience with the practice of 
side channel attacks. Their use of simulators, based on physical devices, 
to support proofs of leakage resilience allows verification of underlying 
assumptions: the indistinguishability game, involving real vs. simulated 
leakage, can be ‘played’ by an evaluator. Using a concrete, block cipher 
based leakage resilient PRG and high-level simulator definition (based on 
concatenating two partial leakage traces) , they included detailed reason- 
ing why said simulator (for AES- 128) resists state-of-the-art side channel 
attacks. 

In this paper, we demonstrate a distinguisher against their simula- 
tor and thereby falsify their hypothesis. Our distinguishing technique, 
which is evaluated using concrete implementations of the Standaert et 
al. simulator on several platforms, is based on ‘tracking’ consistency 
(resp. identifying simulator inconsistencies) in leakage traces by means 
of cross-correlation. In attempt to rescue the approach, we propose sev- 
eral alternative simulator definitions based on splitting traces at points 
of low intrinsic cross-correlation. Unfortunately, these come with signif- 
icant caveats, and we conclude that the most natural way of producing 
simulated leakage is by using the underlying construction ‘as is’ (but 
with a random key). 

Keywords: leakage resilience, side channel attack, simulatable leakage, 
cross-correlation. 


1 Introduction 

At Crypto’ 13, Standaert et al. ns] proposed a new notion for leakage resilience 
involving simulators. The intuition behind their proposal is that if an adversary 
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cannot tell the difference between real leakage and simulated leakage (from a 
simulator that does not know the secret key), then clearly the leakage does not 
reveal any information about the secret key. This offers a middle ground which 
connects theorists (who desire provably secure scheme) and practitioners (who 
require empirically verifiable constructions). In this paper we show that while 
this is a step in the right direction in terms of modelling leakage, the specific 
simulator given for their construction is in fact distinguishable. We explain why 
this is the case, and show how to resolve the problem so that their theoretical 
proof still holds. 


1.1 What is Leakage? 

A fundamental discrepancy between theory and practice lies in the different un- 
derstanding of what constitutes ‘leakage’. When examining the vast literature 
on side channels and leakage resilience, there seem to be three different under- 
standings of what constitutes a leakage function. The first two of these ideas 
are typically found in works written by practitioners (such as |8lll| f and centre 
around a mathematical description of the physical nature of real-world leakage 
traces. The third, in contrast, seeks to define leakage functions in more general 
and powerful terms, and is often found in theoretical contributions. 

Leakage understood as the modelling of the physical nature of the observed leak- 
age points. The first understanding is that a leakage function is the mathemat- 
ical function describing the shape and form of points in leakage traces. Such 
a function then models the manner in which the operations and data act on 
the physical environment, alongside other electrical components including the 
measurement apparatus, and the environment conditions. This understanding 
of leakage implies that the leakage function fundamentally depends on how the 
leakage is acquired (because it includes the measurement apparatus). It also 
implies that the leakage function, whilst being key dependent, is in principle 
unbounded: every new measurement gives more information. 

Leakage understood as modelling of the exploitable information about the key. 
The second understanding is that of a mathematical function that again de- 
scribes the leakage traces; however, the conceptual emphasis is that the term 
‘leakage’ refers to leakage about the key. A key must have a finite length and 
therefore the leakage function can never reveal more than that amount of infor- 
mation. It is hence a function that could (depending on the number of queries to 
the function), under ideal circumstances, reveal the entire key information but 
no more than that. 

Leakage understood as a mathematical concept largely separated from any phys- 
ical interpretation. The third understanding is that leakage is a function that 
has certain restrictions (for example, see [415] ^ such that defining cryptography 
is still possible, but otherwise is meant to be as general and powerful as possible. 
Consequently a direct physical interpretation is not intended as such; rather, the 
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idea is pursued that any ‘realistic’ leakage function is included given the general 
nature of the definition. The discrepancy that arises from this perspective is that 
for practitioners, any overhead that is incurred because of ‘proof relics’ or pro- 
tection against ‘magic’ such as future computation attacks ITS] is an unnecessary 
expense. 


1.2 Simulatable Leakage 

The recent contribution by Standaert et al [IT hence comes as a welcome ad- 
dition to the current approaches for dealing with the concept of leakage as part 
of provable security. In a nutshell, the authors suggest that a sensible notion for 
leakage resilience is that if real leakage cannot be distinguished from simulated 
leakage (from a simulator that does not have access to the secret key fc), it cannot 
contain any information about the key. This approach removes the problem of 
having to mathematically define a leakage function: instead one gets a concrete 
instance of it in the form of an actual simulator. Rather than struggling with 
meaningful definitions for what is leakage, and how to practically derive bounds 
for it for a concrete device, the new challenge is therefore to define and build 
(practical) simulators. 

The challenge of building concrete leakage simulators for invertible functions 
was also taken up in [1]5], where the authors suggest an efficient solution: given 
some public input x and output c of an invertible scheme /, they explain how 
a trace can be constructed with a random key k* that is consistent with the 
public inputs x and c. This can be done by choosing a random key k* and 
computing d = fk*(x), xt = f^ 1 (c) , and then c = fk*{xt) to generate leakage 
traces L(x) = l l ||a, L{xf) = /3\\l r that can be ‘split up’ (as indicated) and 
concatenated to a new simulated trace (, l l \\l r ). The g-sim game, that can be 
played in practice, consists of the attacker trying to distinguish real traces from 
simulated traces given q real (or simulated) traces by using whatever state of the 
art attacks that are available. The rest of m consists of two major contributions; 
first they discuss why state of the art side channel attacks cannot win this game 
effectively, and, second, they use the game restricted to q = 2 to prove a PRG 
(using AES as the underlying PRF) construction leakage resilient in the standard 
model. 


1.3 Our Contribution 

We show there exists a side channel distinguisher (against Standaert et aV s sim- 
ulator when instantiated with AES) which can effectively distinguish simulated 
traces: it does so by detecting the fact such traces are constructed via split and 
concatenation in the inner encryption rounds. We do not require knowledge of 
input or output, or access to the auxiliary leakage oracles for building templates. 
Our attack is based on using cross-correlation to check the consistency of the 
data flow across encryption rounds and in order to pinpoint inconsistencies from 
splitting and concatenating traces, and works across different real world plat- 
forms. Our distinguisher has the property that playing a game with q traces for 
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a single key is equivalent to playing a game with one trace for q keys, this implies 
we can win the game in an asymptotic setting. 

We analyse the properties of (cross-)correlation in this specific application to 
identify the factors that impact on the ability to win the game efficiently. The 
factors are the (intrinsic) cross-correlation between points in leakage traces, and 
the ratio between signal and noise. Whilst changing the ratio between signal and 
noise is well understood and practically achievable, it is not sufficient with re- 
gards to decreasing an adversary’s chance to win the q- sim game. Our attempts 
to work with points that have low intrinsic cross-correlation were only success- 
ful in theory: for the concrete instantiations in our paper, which are based on 
AES, there (in theory) should exist such points because of the nature of Sub- 
Bytes. Theoretically, the input and output of the SubBytes operation are highly 
uncorrelated. However, we explain why in practice it remains a challenge to 
exploit this. Finally we suggest a method that indeed withstands the powerful 
cross-correlation distinguishes which is based on instantiating the PRG with a 
double block cipher construction. Our proposed simulator then uses a meet-in- 
the-middle technique to determine keys that map x to c without introducing 
inconsistencies in the data flow. This simulator is somewhat theoretical because 
of the implied computation cost. However it allows the proof given in m to 
hold once more (remember that this proof crucially depends on the existence of 
a simulator). 

Finally we note in our work that even the computationally expensive simulator 
still requires noisy traces: it would seem then that the most natural way to 
produce simulated leakage is to just use the construction ‘as is’ and run it with a 
random key. For sufficiently noisy traces even profiling prior to the game will not 
help an attacker: noisy leakage implies that an adversary must have sufficiently 
many traces to distinguish real from simulated leakage. By limiting q in the 
game stage this will be infeasible. With this somewhat simple fact in mind, 
we can argue that leakage can be simulated for any cryptographic primitive or 
construction. It however requires implementations with high noise. 

2 Simulatable Leakage: Standaert et a/.’s Model 

Before we discuss the model introduced by Standaert et al. in m, we will 
introduce the required notation. The probabilistic leakage of a block cipher will 

def 

be given as BCfc(x) ^ l = L(k,x) where L is the leakage function, x is the 
plaintext and k the secret key. The leakage function can be described as a vector 
l = (Z i , Z 2 , ...)• For a block cipher, which typically consists of several encryption 
rounds, we group those trace points corresponding to a round and indicate this 
by placing the round number as a superscript l\ For AES- 128 we can represent a 
leakage as l = [Z 1 , . . . , l 10 ]. We will later require to split and concatenate leakage 
vectors. For this purpose we use the short-hand l 2,J = [P, . . . , V] to denote that 
we take the parts of the leakage vector that correspond to rounds i up to j. 
To highlight where we ‘split and concatenate’ within leakage vectors we use || 
to explicitly mark concatenations. We often need to work with sets of leakages 
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and so denote such a set with bold typesetting, i.e. 1 now is a set of leakages, 
7 p] now means that we refer to rounds i until j in all leakages in the 
set. Finally, if we need to differentiate between points within multiple leakages 
we will use a subscript, i.e., l n = (/i, u ^ 2 ,n 5 •••) means we index the 'M-th point in 
each leakage vector in 1. Now that the notation has been defined, we are ready 
to discuss the model. 


2.1 Model and qr-sim Game 

Figure [T] describes the ^-simulatable leakage game (g-sim) from [19] for com- 
pleteness. Recall that the intuition captured in the g-sim game is that if an 
adversary cannot distinguish real (i.e. depending on the secret key) from sim- 
ulated (i.e. depending on a random key) traces, the real traces cannot contain 
any information about the secret key. In the game, the adversary can make q 
queries to the Enc oracle and receives back the encryption of x under key k 
and either the real leakage L(k,x) or the simulated leakage S L (k* ,x,c). The 
adversary can also make s ^ queries to the leakage oracle with a chosen key and 
message, which represents profiling a device (i.e. the adversary can attempt to 
derive (compute, or represent) the otherwise unknown leakage function). We 
note that this, in particular, allows an adversary to query the leakage function 
on the inputs from the game, so templates specifically for the inputs used in 
the game can be derived. The last oracle which can be called once is Gen which 
delivers simulated leakage for a chosen message/key pair where either the real 
or random key in the game is output as the ciphertext. This is to represent the 
fact that often encryption keys themselves are the result of block cipher invoca- 
tion, i.e. in practice (and in the constructions discussed in [T9] ) the encryption 
key used in round r is generated as the output of the block cipher in the previ- 
ous round r — 1. The adversary’s advantage is calculated as Adv^™ BcW = 

| Pr[g-sim(^4, BC, L, S L , 1) = 1] - Pr[<?-sim(A, BC, L, S L , 0) = 1]|. 


2.2 Construction 

The model given in m is used to prove the security of a leakage-resilient PRG 
which is instantiated using block ciphers (we will continue the pattern started 
in m and will instantiate the block cipher with AES). This construction show 
on the left in Fig. [2] and the underlying 2PRG is shown on the right in Fig. [2j 
When considering this within the g-sim game we note that each key is only used 
twice (once to create the PRG output and one to create the new key) and thus 
the value of q = 2 is the one of interest. 


2.3 Simulator 


For completeness we also recall the simulator in Fig. 3(a)[ The simulator takes 
in a random key as well as a plaintext /ciphertext pair (x,c); note that this 
pair was created using a key different to k * . The simulator first encrypts x under 
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Experiment g-sim(^4, BC, L, S L ,l >): 

k,k* A {0,1}" 

( 0 , 0 , 0 ) 

y ^ j^Enc,(-),Gen(‘,‘),Leak(-,-) Q 

Return b / 

proc Enc(x): 
i i — z H - 1 

if z > g then 

Return _L 
end if 
c «- BCfc(x) 
if 6 = 0 then 
A i — L(k, x ) 
else 

d S L (k*,x , c) 

end if 

Return (c, d) 


proc Gen(z,x ): 
if Z = 1 then 
Return _L 

end if 

l <r- 1 

if b = 0 then 

A S L (z , x, fc) 

else 

yl x, k*) 

end if 

Return A 

proc Leak(z,x ): 

3 J + 1 
if J > s ^4 then 
Return _L 
end if 

A L{z, x) 

Return A 


Fig. 1. g-simulatable leakage from m 



Fig. 2. Left: Leakage resilient PRG. Right: 2PRG construction 


k* and records the leakage. The next step is to decrypt c under k* to get a new 
plaintext x' . The final stage is to encrypt x' under k* (note that this will encrypt 
to c) and record the leakage. The leakage is the split (in half) and concatenated 
such that the first part of the new trace corresponds to the leakage on x while 
the second half corresponds to the leakage on c. This simulator is referred to 
as split and concatenate (S s & c ) simulator and we will refer to traces from this 
simulator as s&otraces. 

3 The Security Game for the Practical Use of the 2PRG: 
the p-q- sim Game 

It is clear that in the g-sim game, the adversary can get only q queries on a 
key: this represents only a single round in the PRG. However, based on the 
construction for the 2 PRG, we argue that the appropriate security game, which 
we denote the p-g-sim game, should take into account that potentially many, 
say p calls to the 2 PRG are made. That is, in the p-g-sim game the adversary 
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simulator S^ r (k* ,x,c): 
c' <r- AES fc *(x) Z 1>5 ||a 
x’ <r- AES-^c) 
c<-AESfc.(i') /3||Z 6 ’ 10 
Return Z 1 ’ 6 ||Z 6,10 

(a) Description of simulator S^ :c 



xlO 4 

(b) Simulated Sg& c SASEBO-R trace 


Fig. 3. Definition for the S^ c simulator and an exemplary trace 


gets to make q queries against either p real or p simulated instantiations (with 
p different keys) and then he must work out if the leakage he is seeing real or 
simulated leakage for all of the p instantiations. 

We argue further that the p-g-sim game is in general more appropriate: in an 
evaluation context, which is what m really consider, the g-sim game would be 
played for real, and it seems unlikely that an evaluator would only ever play the 
game once and take q traces (especially if q is small). More likely, one would 
play this game several times to get a sense of the success rate for q traces. 

It would be tempting to believe that an adversary cannot exploit the infor- 
mation leakage across different games because they are based on different keys. 
Traditional side channel distinguishers require after all to make key-dependent 
hypotheses: hence whenever a new key is introduced the attacker is presented 
with a new, fresh challenge. Whilst [19] discusses why the original g-sim game 
does not hybridise with respect to g, they do not consider the possibility that 
the game does hybridise in the number of keys p. If the game were to hybridise, 
then we would have that Adv^"^ n ^ c (A) < p • Adv^™ BC (A); the game gets 
easier to win if it is played more often. 

In the next section we will show a distinguisher that can work across different 
keys and thus can take advantage of the full (and more realistic) p-g-sim game. 

4 Breaking the Split-and-Concatenate Simulator 

Whilst the proposed simulator can be instantiated with any invertible function, 
we continue to follow Standaert et aVs exposition and use implementations of 
AES- 128 as running examples. To explain why cross-correlation is an efficient 
distinguisher we briefly recall how a typical side channel trace relates to the 
information flow in an AES implementation. 

4.1 Properties of Real World Leakage 

As argued in previous work |8|3j , any implementation of AES will happen as a 
sequence of steps that correspond to the processing of intermediate values. For 
instance, in a serial implementation, the state bytes will be accessed sequentially 
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(a) Power trace showing a single AES (b) Power trace showing individual as- 
round. sembly instructions 


Fig. 4. Power traces for an 8051 8-bit microcontroller 


and updated according to the AES round function. As all high level functions 
(SubBytes, MixColumns, etc.) are processed by some gates at the lowest level, 
a clock signal is involved that governs (to some extent) when data flows be- 
tween gates. All changes in gates, in each clock cycle, produce some form of 
leakage (time for signals to travel, power consumed, radiation emitted) that 
becomes available through observing the device. Figure [4] illustrates the power 
consumption measurements from an AES implementation on a simple 8-bit mi- 
crocontroller (i.e. all intermediate values are represented as bytes). Evidently, 
we can see patterns in Fig. |4(a) , which represents a single AES round. 

In case of this very simple processor, we can even identify the effect (with 
regard to shape and height) of individual instructions in the power consumption 
by zooming into the trace further, see Fig. |4(b)| It shows now only a small 
part of the first round that corresponds to performing the SubBytes operation. 
The instructions used for this purpose are register transfers, (MOV and MOVC), a 
register increment (INC), and conditional branches which correspond to a loop 
that runs through the 16 state bytes. 

In a parallel implementation, each operation will touch multiple state bytes 
but the sequence of round functions must remain sequential. SubBytes is some- 
times implemented using combinational logic in dedicated hardware [22] . A com- 
binatorial logic circuit is not governed by a clock but the output from such a 
circuit will typically be connected to a synchronous storage element. Referring 
to Fig. 5(c) we can see that there are 10 visible peaks relating to the AES-128 
rounds. 


4.2 Cross-Correlation as a Distinguisher 

The term correlation is often used to refer to a broad class of statistical de- 
pendencies in data. There are different metrics to measure correlation. Most 
commonly used (at least in side channel attacks) is the Pearson correlation co- 
efficient, and we use this metric when we refer to correlation in this article. In 
the context of side channel analysis, correlation has been applied in DPA before 
[2] and its properties as a good distinguisher are well understood [12]. 
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Cross-correlation is a term coined in signal processing and is commonly used 
to identify (and measure) similarities of wave forms. It has been used in the 
context of side channel analysis before, e.g. [13118121] . 

We make the simple observation that the cross-correlation of signals of length 
one (i.e. points) equates to computing the correlation between trace points 
(l u ,l v ) (as opposed to the correlation with key-dependent predictions in the 
DPA context). Equation ([Tj) recalls how (Pearson) correlation is defined and 
estimated. 



(i) 


Cross- Correlation Traces. We recall that cryptographic algorithms are im- 
plemented as step-wise processes (with varying degrees of parallelism) . Although 
AES mixes keying material and input efficiently, and so the correlation between 
key, input, and output is small, we can expect a high correlation between the 
subsequent states, e.g. we expect a high correlation between the input and out- 
put of Shift Rows, as well as states that operate on the same data (even though 
they might be separated in time). This in turn implies that we can expect a high 
correlation between subsequent points within leakage traces (see m Ch. 4]), as 
well as points that are related to the same intermediate values. Further to that 
we can also expect high correlations between data that is related to the program 
state but independent of the states, e.g. the value of the program counter, point- 
ers to memory locations, etc. Hence any implementation will lead to a specific 
cross-correlation trace depending on the data flow. 

By producing a cross-correlation trace that shows the cross-correlation of all 
pairs of points, i.e. {p( l u , l v ), V(tq ?;)}, we can consequently track the consistency 
of data flow. Such a trace however would be very long (the length would be 
the square of the original traces’ length). We hence opted to reduce the cross- 
correlation data by selecting the highest cross-correlation value for each u over 
all ‘distant’ pairs (l n ,l v ), i.e. for each u we took p u = max{p(\ u ,\ v )\/v} where 
v is not within a small window around u. Hence our cross-correlation traces 
P = {pm have the same length as the actual leakage traces, and they have 
uninformative trivial cross-correlation removed because neighbouring points are 
not considered. Consequently the cross-correlation traces show the effect of data 
consistency, and any ‘dip’ implies some form of discontinuity in the data flow. 

We provide in Fig. [5] some cross-correlation traces for illustration. We chose 
two devices with contrasting architectures to demonstrate that cross-correlation 
works irrespective of the underlying device. The first device features a highly 
serial implementation of AES where each step only touches at most one byte of 
the state. It is representative of AES implementations in the low cost market. 
The second device features a highly parallel implementation of AES. Such an 
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(a) AES power trace (8051) 
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(b) Cross-correlation trace (8051) 



(c) AES power trace (SASEBO-R) 



Fig. 5. AES power traces and cross-correlation plots 


implementation is more likely to be found in high end products where imple- 
mentation speed and security is considered an important factor. 


4.3 Detecting s&otraces 

In the proof given in ns i, an adversary plays the q-s\m game and so has access 
to the oracles Enc , Leak , and Gen (an adversary may also query them in the 
p-q - sim game). Our distinguisher does not require any access to the oracles Leak 
and Gen. 

To detect the s&c-traces we apply the cross-correlation method to a set of q 
leakage traces. The detection is then based on the absence of cross-correlation 
that would otherwise be present in traces that have not been simulated, i.e. a set 
of points which leak on the same data no longer show a significant correlation 
with respect to each other. 

The g-sim (and p-q- sim) games define that there exists a simulator that is 
secure for all adversaries (with set computational limits). We may hence assume 
the adversary’s knowledge includes all implementation details of the PRF as well 
as the principle of the simulator, i.e. where the traces are split and concatenated. 
Recall that a cross-correlation trace shows ‘patterns’ which are based on the rela- 
tionships between following and related intermediate values. Consequently, even 
a cross-correlation trace from the simulated leakages will show such patterns, 
see right hand side of Fig. [5j Only at the round where the simulated traces 
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have been split and concatenated a different pattern will occur. Consequently 
an attacker gets a ‘fingerprint’ for how the cross-correlation trace should look 
(in the case of the s&c-simulator) by examining the beginning and end of the 
cross-correlation trace. Any significant deviation from the fingerprint (i.e. any 
discrepancy between the two) will hence identify the s&otraces. 


4.4 Experiments for Real Devices 

Practical attacks are often played down because they are device specific and 
hence it is often not possible to draw general conclusions from a single attack. 
However, the heart of the proposal in [19] is to be able to implement a secure 
simulator on some real world devices. We consequently did not want to resort 
to ‘pure simulations’ and opted to fully implement simulators for several real 
world devices. By choosing different devices we can refute the argument that 
our analysis outcomes are not valid in general: the devices we choose have differ- 
ent architectures that lead to different AES implementations, different leakage 
models and different noise characteristics. 

AES can be implemented in different ways. Serial implementations are often 
found on small processors (8-bit or 32-bit). These are software-only implemen- 
tations and we have used two different widely-used processors to instantiate 
such a software implementation for our attacks. Parallel implementations can be 
found as dedicated hardware implementations. In practice 32-bit implementa- 
tions would be considered as suitable for constrained devices such as smart cards, 
whereas highly parallel 128-bit architectures would be used when throughput is 
a practical concern. We opted to use a highly parallel 128-bit architecture to 
provide contrasting results to the software implementations. In the following, 
we give a brief overview of the results obtained from each architecture. Details 
regarding the acquisition setup and target devices can be found in Appendix lAl 


Software Implementations. We used a general purpose microcontroller which 
features an 8051 instruction set as the first device for our attacks. The cross- 
correlation plots for this architecture reveal detailed information about the data 
flow during the execution of an algorithm. In our running example (AES- 128), 
we are able to detect the order in which state bytes and functions are accessed, 
the operations performed and for a masked implementation, when and where 
each masked is applied. In Fig. |6(a)| and |6(b)| we show a portion of the cross- 
correlation for real and simulated leakage traces respectively. There is a clear 
break in the cross-correlation as a result of the concatenation between traces 
with inconsistent states. We repeated the s&c experiment with an arjvQ based 
32-bit architecture and implementation. Like for the 8051, we can track the AES 
data flow and so the simulated leakage trace, once again, leads to a drop in the 
cross-correlation. 


1 The cross-correlation plots for this device are available in the full paper 1 0 . 


234 


J. Longo et al. 





(c) Real trace (SASEBO-R) (d) Simulated trace (SASEBO-R) 


Fig. 6. Cross-correlation distinguisher plots 

Hardware Implementation. The SASEBO-R ASIC boasts a large number 
of cryptographic functions implemented as dedicated logic. Unlike the two pre- 
vious devices, the information leaked is no longer dependant on processor op- 
erations but on combinatorial switching. Figures [5(c) | and |3(b)| show the power 
consumption over an execution of AES and a simulated trace generated by the 
s&c simulator respectively. The cross-correlation distinguisher now plays on the 
coherency of the combinatorial switching rather than the state leakage at each 
clock cycle. As with the software implementations, we can easily identify the 
simulated leakage traces, see Figs. |6(c)| and |6(d)| One key difference over the 
software implementation is cross-correlation no longer reveals any information 
about the data flow or what operations are being performed but simply that 
there exists some data dependency in the power consumption of the device. 


4.5 Measures to Secure the Split-and-Concatenate Simulator 

Recall that we estimate the cross-correlation between trace points, which implies 
varying inputs and/or keys. The number of different inputs per key is q , whereas 
the number of different keys is p. We may hence decide to increase p and keep 
q small, which implies that we can break the PRG construction of m because 
it has multiple rounds. As our distinguisher hybridises, the construction looses 
security every time a round is leaked, on even though fresh keys may be used! 

We now discuss how traditional engineering-style countermeasures impact on 
the success of winning the game. We begin by showing how noise on leakage 
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traces will help to make winning the game harder. We then explain why masking 
and hiding approaches are unlikely to defeat the cross-correlation distinguisher. 

Increasing the Noise. Just as we can write down a formula for the impact 
of the signal-to-noise ratio (SNR) on a correlation-based DPA attack (see, e.g. 
[10, Ch. 4.3.1]), we can express the impact of the SNR on our proposed cross- 
correlation attack. 

For this we write leakage points as a direct sum of an (unknown) signal (S) 
plus independent (Gaussian) noise (N), i.e. \ u = S n + N u , with N n ~ jV(0, cr u ), 
and l v = S v + N-y, with N v ~ J\f(0,cr v ) |3I11| . The signal to noise ratio (SNR) 
is defined as SNR = y ar(N) • The respective SNRs at the points u and v are 
then SNR U = Var(S u )/Var(N u ) and SNR V = Var(S v )/Var(N v ). Making the 
simplifying assumption that SNR U « SNR V = SNR , it turns out (using the 
same technique as [10, Ch. 4.3.1]) that 

p(l'lt) 1^) = S-u) • ~ ■ J 

1 + SiNR 

In comparison to a DPA attack (we refer to [10, Ch. 6.3]), the impact of 
the SNR is potentially stronger on this distinguisher. However, because this 
distinguisher hybridises over different keys, it is also easier to gather more leakage 
traces to compensate for this. In particular, it follows that asymptotically over 
an increasing number of keys p we can still win the game for any small q. 

More Complex/Parallel Architectures. Another important observation at 
this point is that in contrast to DPA attacks, the ‘relative size’ of the intermediate 
value (in terms of its bit length in contrast to the overall device state) itself is less 
important. At first glance this goes against the intuition from DPA style attacks 
where practice has shown that they become harder for architectures that employ 
a larger data-path (and so the intermediate values that are attacked contribute 
only a small amount to the overall leakage). For instance, DPA style attacks on 
32-bit processors often only predict 8 bits of the 32-bit state, so the remaining 24 
bits are noise. In addition, in DPA style attacks using correlation one requires to 
‘model’ the leakage behaviour, and especially for dedicated hardware, standard 
models such as the Hamming weight or Hamming distance are less than ideal 
approximations. 

The cross-correlation distinguisher however does not require to model the 
device leakage and, importantly, we are effectively using the entire state in- 
formation because we are working with points and do not have to make any 
predictions. This explains the contrast to typical DPA style attacks, and so the 
highly effective nature of the distinguisher. 

Masking and Hiding Approaches. It would be tempting to assume that 
any countermeasure against correlation-based DPA attacks would automati- 
cally work against the cross-correlation distinguisher because both distinguishes 
share the same statistical method. 
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simulator S% c (k* ,x,c)\ 
c',ST 5 <r- AES t .(i) Z 1 - 4 !^ 
x' «- AES^(c) 
c,ST e ^AES fc . (a:') -^lir 3 ’ 10 
Construct k # using STr , , STi, 
ST 6 <r- AESl fc# (ST 5 ) I s 
Return Z 1 ’ 4 ||Z 8 ||Z 6 ’ 10 
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Fig. 7. Simulator description and cross-correlation comparison for 62c 


The two main engineering approaches to distinguish classes of countermea- 
sures are hiding and masking m Hiding countermeasures typically change the 
SNR and so increase the number of leakage traces that are needed for a successful 
attack. 

Masking (i.e. secret sharing) countermeasures aim to make exploiting the in- 
formation impossible by distributing it over different intermediate values (and 
hence leakage points), such that it becomes increasingly infeasible to ‘recom- 
bine’ that information. In practice however it is not possible to implement secret 
sharing with many shares. Typically only two, or at most three, shares are used 
and the masks (i.e. randomness) are not refreshed in between rounds or in be- 
tween invocations of an intermediate value 051 • Consequently, practical mask- 
ing schemes maintain the consistency between the subsequent transformations 
on the state and so the cross-correlation distinguisher remains applicable. 

5 The Challenge of Making Secure Simulators 

Given that traditional countermeasures are not suitable to rescue the split-and- 
concatenate simulator, we have to come up with new ideas. Two approaches are 
(intuitively) worth pursuing. Firstly, this is to try and maintain state consistency 
across the concatenated traces. Secondly, this is to split where there is a ‘natural’ 
discontinuity in the data flow. 


5.1 Maintaining State Consistency 

Over the execution of any algorithm there exists a degree of consistency be- 
tween the intermediate values, which is disrupted by splitting traces. Hence, we 
attempted to design a ‘state aware’ simulator by generating an ‘intermediate 
round’ that ensures such consistency. 

This simulator shown in Fig. [7] and the extra notation can be understood as 
follows; STi is the state of AES at the start of the i th round and AES1& runs a 
single round of AES on round key k. 

The simulator S2 C operates similarly to S s & c by first performing an encryption 
of x under the key fc*. The leakage captured from this corresponds to the first 
four rounds Z 1,4 . We also store the state ST$. Next, the encryption of x' under 
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k* is performed and the leakage of the last 5 rounds is captured / 6,1 ° along with 
STq. To connect the two otherwise disconnected states we generate an extra 
trace AESl fe # (STq) ^ Z 5 . Note that finding the key k # which maps STq to STq 
is simple considering only a single round of AES. 

We proceeded to implement the S 2 C simulator for the 8051 and the resulting 
cross-correlation was once again able to detect the simulated traces. This time 
it detected the discontinuity in the round key schedule. This shows how hard it 
is to achieve state consistency because we have to take into account the AES 
state as well as the AES key schedule (in the p-q-s\m game), and the fact that 
different instructions can leak subtly differently. 


5.2 Leveraging an Algorithm-Dependent Data-Flow Discontinuity 

Given our running example is AES, the natural candidate intermediate value is 
SubBytes, because the input and output of SubBytes are (almost) statistically 
uncorrelated. We can hence expect that there is also only a low correlation be- 
tween the corresponding trace points, which implies that the data flow across 
SubBytes is somewhat discontinued. We tested this idea first on an 8051 soft- 
ware emulator which produces noise-free leakage on the data processed at each 
instruction. The simulated traces were indeed indistinguishable from real traces 
(taken from the emulator) produced. Consequently we attempted to implement 
this on real devices. 

On real devices, one has two implementation choices for SubBytes. Either a 
table is stored and hence a SubBytes computation corresponds to a table lookup 
operation. This is suitable for somewhat ‘serial’ implementations, as typically 
only a single instance of the table is held in logic. This is the option used in 
our AES software implementations. Alternatively, one implements it as combi- 
national circuit, which is hence suitable for dedicated hardware platforms. This 
option is used in the hardware AES on the SASEBO-R. 

Our new simulator S s b c ‘tweaks’ the S^ c simulator construction to perform 
the split-and-concatenate at the S-box lookup rather than points of ‘no activ- 
ity’. However, we note that for a sequential lookup, the simulator is required 
to perform splice at each S-box rather than a simple split-and-concatenate as 
illustrated by Fig. 8(a)| (hence we effectively ‘chop up’ a trace). 


A Practical Attempt for an 8051 Processor. Implementing this for a real 
8051 device reveals the challenges of real world (imperfect) leakage. Although 
we could pinpoint the exact location of the SubBytes operation, it so happens 
that each low-level operation is performed over multiple clock cycles and hence 
leaks multiple times for each operand. To be precise: consider the lookup tq = 
M[A], where a register ro is loaded with the contents of memory address A. The 
resulting leakage trace resembles the form [C(A)\\C(M[A\)\\C(A)\\C(M[A\)} for 
some leakage function C in our 8051 processor. 

As a result, the simulator for the 8051 needs to splice multiple times within 
each S-box lookup. This behaviour is clearly very architecture specific. Fig- 
ure |8(b)| shows the cross-correlation resulting from splicing at each S-box; the 
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(a) A visual illustration for S s b c (b) Cross-correlation comparison for S s bc 

Fig. 8. Illustration for a serial S s b c simulator and a cross-correlation comparison of 
Ssbc for the 8051 microcontroller 

top plot (printed in blue) shows the cross-correlation trace as derived from real 
traces. The middle plot (printed in red) shows the cross-correlation trace as 
derived from simulated ( S s b c ) traces. Whilst a visual detection seems difficult 
at first, an adversary with information about the time points (or a fingerprint, 
which we explained previously can be constructed even from simulated traces) 
can spot the difference. This is made clear by the lowest plot (printed in black) 
that shows that there is a distinct difference between real and simulated cross- 
correlation. 

A Practical Attempt for the SASEBO-R. We now consider the implica- 
tions of a parallel combinatorial SubBytes function as performed by the 
SASEBO-R ASIC. Pinpointing the SubBytes operation is no longer such a trivial 
task as each round function is evaluated as a combinatorial circuit rather than 
being governed by an external clock. Attempting to model the leakage in an ideal 
setting would also require significant knowledge of both the design and layout 
of the ASIC. We hence resorted to a exhaustive search over a whole encryption 
round in order to determine whether or not a point existed that would allow us 
to build the S s b c simulator for such a device. 

Perhaps unsurprisingly, we were unable to identify points that did not produce 
a significant drop in the cross-correlation. This is primarily due to the relation 
between evaluation stages of a combinatorial circuit. Without further insight on 
the design of the ASIC, it is impossible to determine what processes were taking 
place that made building a viable S s b c simulator impractical. 

6 A Sound Simulator 

The previous section has shown that the intuitions for building secure simulators 
failed to translate to the practical devices that we considered. Furthermore, 
how they failed leaves us with little confidence that using other platforms, e.g. 
embedded platforms (with other processors) or a different combinational circuit 
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simulator S L (x,c ): 

Perform a meet-in-t lie-middle attack to learn a valid (&*, k* ) 
BC fcf (BC fc; /(: x))*A 
Return A 

Fig. 9. A generic simulator S secure against the cross-correlation distinguisher 
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Fig. 10. The adjusted PRG construction 


for SubBytes, would be any more successful. The fundamental hurdle is simply 
that real world leakage is complex, and cryptographic algorithms are necessarily 
implemented via step-wise processes. Hence there is a specific data flow that will 
be somehow disrupted when using a split-and-concatenate approach. Without 
substantial alterations to the devices’ designs this cannot be easily changed. 

Splitting within one instantiation of a block cipher seems hence futile: but how 
about considering constructions that are based on two somewhat independent 
block cipher calls? 


6.1 Doubling the Cipher 

Now we discuss an approach based on using a double block cipher (i.e. a block 
cipher 2BC that consists of two sequential computations of a block cipher BC 
with independent keys ki and &'): c = BC^(BC^ (x)). In this construction 
there is a natural discontinuity between the first and the second encryption with 
regards to the key state and so the data flow across the ‘boundary’ between the 
first and the second encryption. This makes this boundary an ideal place to ‘split’ 
traces, and a generic simulator S that uses a meet-in-t he-middle technique [9] 
to find a suitable pair of keys follows immediately (see Fig. E}. 

For completeness we show that this simulator can be plugged into the PRG 
construction from m maintaining the correctness of the proof. We only switch 
out the underlying PRF from AES to double AES, and subsequently we need to 
switch the 2 PRG for a 3 PRG for the extra rekeying material. The proof given in 
[T9] can be expanded for any constant number of calls to the PRF and thus the 
construction will not need to be reproved secure. The resulting construction can 
be seen in Fig. [TO] 
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6.2 Some Final Considerations 

In the case of AES, this simulator requires approximately 2 65 AES encryptions 
(because of the meet-in-the-middle technique) per valid trace. This is compu- 
tationally too expensive to be practical; yet the simulator is secure against the 
cross-correlation distinguisher per design so a practical implementation is not 
necessary in that regard. 

However, its security against standard DPA attacks still needs to be con- 
sidered. Recall that this already is an advantage because DPA attacks do not 
hybridise over different keys! 

Considering now a standard DPA on S L one would notice (in the process of 
performing such an attack) that no key hypothesis ever achieves a good cor- 
relation with the simulated traces. Hence for a DPA distinguisher we need to 
consider the question of how many traces are necessary to decide with some 
certainty that all key candidates are equally likely. We leave this as an option 
question but note that the usual arguments for DPA success (or lack thereof) 
will apply. These are that for a (reasonably) low SNR, and a small number of 
leakage traces g, an attack does not succeed. In particular for the purposes of 
the PRG construction from m, which limits q to be two, practical instantiation 
of S on an ASIC such as the SASEBO-R should be feasible. 
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A Acquisition Setup and Target Devices 

In this appendix we outline the equipment used to gather the side channel data 
provide a brief note about each of the target devices used throughout the paper. 

A.l Acquisition Setup 

The hardware used throughout the experiments follows a typical acquisition 
setup commonly found in the literature IB] El Ch. 3]. The measurement appa- 
ratus used for each of the experiments is as follows: 
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— Tektronix DPO7104 lGhz Digital Oscilloscope. 

— Tektronix P7330 High-performance differential probe. 

— TTI EX354 Stable bench power supply. 

— Agilent 33220 Signal generator. 

The power consumption for each device was captured by measuring the drop 
across a resistor placed in the ground return path for each of the devices. 

A. 2 The AT89S8253 8051 Microcontroller 

The AT89S8253 [1] is an 8-bit microcontroller which represents the lower end 
market for hardware. The device can be found in smartcards and is well docu- 
mented in the side channel community for it’s Hamming weight leakage model. 
This was used in conjunction with the DPA Demo board from IAIK-TU[20], 
The AES implementation was limited to an 8-bit serial implementation due 
to the architectural constraints. Each of the S-box operations were executed as a 
table lookup. The device was clocked at 12Mhz throughout all experiments and 
the oscilloscope set to capture the power signal at 200Ms/s. 

A. 3 LPC2124 ARM7TDMI NXP Microcontroller 

The LPC2124 [14J microcontroller is a 32-bit RISC microcontroller with a 4 stage 
pipeline. This device serves to represent the mid-range market of microcontrollers 
with 32-bit architectures. A custom board was designed and used to facilitate 
power measurement. 

The AES implementation consisted primarily of 32-bit operations to build 
each of the round functions (AddRoundKey, ShiftRows etc.) with the exception 
of SubBytes which was performed as an 8-bit lookup table. The device was 
clocked at 14Mhz throughout all experiments and the oscilloscope set to capture 
the power signal at 250Ms/s. 

A. 4 SASEBO-R Cryptographic LSI 

The SASEBO (Side-channel Attack Standard Evaluation Board) project aimed 
to provide development kits to facilitate side channel research. The SASEBO- 
R [17] board is specifically designed to fit a cryptographic LSIs [16]. The AES 
core used throughout this paper is the AES2 instantiation. Both the clock and 
power regulation for the target ASIC is managed on the SASEBO-R board. The 
oscilloscope was set to capture the power signal at 2Gs/s. 
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Abstract. Following the pioneering CRYPTO ’99 paper by Kocher et 
al., differential power analysis (DPA) was initially geared around low- 
cost computations performed using standard desktop equipment with 
minimal reliance on device-specific assumptions. In subsequent years, the 
scope was broadened by, e.g., making explicit use of (approximate) power 
models. An important practical incentive of so-doing is to reduce the data 
complexity of attacks, usually at the cost of increased computational 
complexity. It is this trade-off which we seek to explore in this paper. 
We draw together emerging ideas from several strands of the literature — 
high performance computing, post-side-channel global key enumeration, 
and effective combination of separate information sources — by way of 
advancing (non-profiled) ‘standard DPA’ towards a more realistic threat 
model in which trace acquisitions are scarce but adversaries are well 
resourced. Using our specially designed computing platform (including 
our parallel and scalable DPA implementation, which allows us to work 
efficiently with as many as 2 32 key hypotheses), we demonstrate some 
dramatic improvements that are possible for ‘standard DPA’ when com- 
bining DPA outcomes for several intermediate targets. Unlike most pre- 
vious ‘information combining’ attempts, we are able to evidence the fact 
that the improvements apply even when the exact trace locations of the 
relevant information (i.e. the ‘interesting points’) are not known a priori 
but must be searched simultaneously with the correct subkey. 


1 Introduction 

Differential power analysis (DPA) was initially conceived as a computationally 
‘cheap’ way to recover secret information from side-channel leakages, under the 
assumption that trace measurements could be easily acquired [14]. Over time, the 
emphasis has changed and several directions have been pursued in the literature, 
e.g. attacks using power models [6] and attacks using several trace points m am 
surveys the many variations of DPA style attacks). Across all these directions, 
one ‘measure’ of attack success has emerged and now dominates the scientific 
discourse with regards to attack efficiency. This measure is the number of power 
traces needed to identify the correct (sub)ke}0. 

1 The overall key recovery works according to a divide-and-conquer strategy; each (for 
example) byte of the key is attacked and recovered individually. 

P. Sarkar and T. Iwata (Eds.): ASIACRYPT 2014, PART I, LNCS 8873, pp. 243 42611 2014. 

(c) International Association for Cryptologic Research 2014 
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What is the purpose of considering (sub)key recovery attacks? From a practi- 
cal perspective any strategy is considered successful if it reveals ‘enough’ infor- 
mation about the (global) key to enable a brute force search. It is crucial that 
side-channel resistance, like other aspects of security, be considered with respect 
to realistic threat models. Real world adversaries are then (arguably) mostly 
interested in exploring trade-offs between the number of leakage traces available 
and the computational resources dedicated to extracting as much information 
as possible from those traces. Recent work by Veyrat-Charvillon et al. [25126) 
presents an algorithm for searching the candidate space containing the key and 
a means to estimate its size if the enumeration capabilities of the analyst are 
below those of a better-resourced adversary. 

Resources, from the point of view of a contemporary DPA adversary, include 
not only sophisticated measurement equipment but crucially also processing ca- 
pabilities that directly map to the time necessary to mount and complete attacks 
[8]. Moradi et al.’s recent work [18] demonstrates how the use of a handful of 
modern graphics cards allows for dramatic increase in processing capabilities, 
enabling an attack on 32-bit key hypotheses in a known point scenario (the leak- 
age point corresponding to the attacked operation was determined a priori via 
a known key attack). 

In this submission we explore the possibilities for sophisticated use of modern 
processing capabilities (such as those associated with high performance com- 
puting (HPC), albeit restricted to the setting of a few machines or a ‘small’ 
cluster) to facilitate ‘ mult i-tar get’ DPA attacks. Multi-target DPA consists in 
amalgamating outcomes from multiple single-target attacks with the aim of re- 
ducing global key entropy more quickly than an individual single-target attack. 
For example, against a sequential AES implementation, multi-target DPA could 
amalgamate the outcomes of standard attacks on the AddRoundKey, SubBytes, 
and MixColumns operations. We will show later that we can do this mean- 
ingfully, and also efficiently, for correlation-based DPA attacks — even in realistic 
scenarios where the exact leakage points for those target functions are not known 
and must each be searched within windows of the trace. Most importantly, we 
show that such attacks can dramatically out-perform single-target attacks and 
are by far the best strategy to minimise the number of leakage traces required. 

1.1 Our Contribution 

An adversary who is capable of attacking large numbers of key hypotheses has 
a greater choice of intermediate target functions to attack. For instance, possi- 
ble AES targets include the output of AES MixColumns (involving four bytes 
of the secret key) as well as the (implementation-dependent) intermediate com- 
putations for MixColumns (involving two or three key bytes at once). Given 
the potential plethora of intermediate value combinations for a sequential AES 
implementation (as typically found on micro-processors) we investigate the ef- 
fectiveness of some of the possible combinations with respect to the reduction 
on key guessing entropy. We also touch on the possibility of combining different 
distinguisher outputs and explain when this is (or is not) going to be helpful. 


Pushing DPA Beyond the Limits of a Desktop Computer 245 


We also take inspiration from the suggestion of Veyrat-Charvillon et al. (25126] 
(originally for the purposes of a key enumeration algorithm) that probability dis- 
tributions on the subkeys can be derived from the outcome of a DPA attack. We 
propose an alternative (more conservative) heuristic for assigning ‘probability’ 
scores to subkeys, and show how these can be used to simply and usefully com- 
bine information from multiple standard univariate DPA attacks in a strategy 
inspired by Bayesian updating. 

This research is rooted in our developed capability to efficiently process large 
numbers of key hypotheses over many repeat experiments; our architecture 
(which we sketch out) is influenced by the design of modern HPC platforms. 

We structure our contribution as follows. We briefly provide the relevant 
preliminaries and then discuss prior literature (Section [5). We then introduce 
our specialised attack framework and explain our attack strategy, including our 
method of assigning and updating ‘probability’ scores, in Section [3J Section [4] 
reports the results of our experiments with simulated leakage data, exploring 
what can be achieved by combining the outcomes of attacks against different 
target functions, as well as investigating the potential to combine different DPA 
strategies. In Section [5] we report the outcomes of some practical attacks against 
traces measured from an ARM 7 microcontroller, including scenarios in which 
the precise locations of the intermediate targets in the traces are unknown. 


1.2 Preliminaries: Differential Power Analysis 

We consider a ‘standard DPA attack’ scenario as defined in m, and briefly 
explain the underlying idea as well as introduce the necessary terminology here. 
We assume that the power consumption P of a cryptographic device depends 
on some internal value (or state) F k *(X) which we call the target : a function 

R 

F k * : X -E Z of some part of the known plaintext — a random variable X E X — 
which is dependent on some part of the secret key k* E /C. Consequently, we 
have that P = L o F k * ( X ) +5, where L : Z — )> R describes the data- dependent 
component and 6 comprises the remaining power consumption which can be 
modeled as independent random noise (this simplifying assumption is common 
in the literature — see, again, [16]). The attacker has N power measurements 
corresponding to encryptions of N known plaintexts Xi E T, i = 1, . . . , TV and 
wishes to recover the secret key k*. The attacker can accurately compute the 
internal values as they would be under each key hypothesis {F k (xi)}fL 1: k E 1C 
and uses whatever information he possesses about the true leakage function L 
to construct a prediction model M : Z -> Xi. 

DPA is motivated by the intuition that the model predictions under the correct 
key hypothesis should give more information about the true trace measurements 
than the model predictions under an incorrect key hypothesis. A distinguisher D 
is some function which can be applied to the measurements and the hypothesis- 
dependent predictions in order to quantify the correspondence between them. 
For a given such comparison statistic, D, the estimated vector from a practical 
instantiation of the attack is Dat = {Dn(L o F k * (x) + e, M o F k {x.))}keJC (where 
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x = are the known inputs and e = {ei}f =1 is the observed noise). Then 

the attack is o-th order successful if #{k E /C : Dat[/U] < Dat[&]} < o. 

The success rate of a DPA attack is the probability that the correct key is 
ranked first by the distinguisher (the o-th order success rate is the probability 
it is ranked among the o first candidates); the guessing entropy is the expected 
number of candidates to test before reaching the correct one [24]. These met- 
rics are often associated with the subkeys targeted in the ‘ divide- and-conquer’ 
paradigm rather than with the global key when the partial outcomes are finally 
combined; we use the terms accordingly, unless explicitly stated. 

Unless stated otherwise, we use the (estimate of the) Pearson correlation co- 
efficient as distinguisher, in combination with a Hamming weight power model. 


2 Related Literature 

Our work unites and advances three broad areas of the literature: resource- 
intensive side-channel strategies, post-SC A global key enumeration, and optimal 
combination of multiple sources of exploitable information. 

Resource-intensive strategies. Such strategies have for a long time been consid- 
ered mainly relevant in single-trace settings (e.g. SPA attacks using algebraic 
methods [19120] ): this has only lately begun to change, with a few recent studies 
making use of modern graphics cards to speed up DPA attacks fails] . These ar- 
ticles essentially use GPUs within a single machine to speed up the processing of 
standard correlation DPA attacks. Our more ambitious approach is to distribute 
all the different components of a DPA attack ( including workloads related to 
combination functionality) across several cards and several machines. 

Post-SCA global key enumeration. Recent work by Veyrat-Charvillon et al. 
[25126] focuses on the opportunity for a well-resourced adversary to view side- 
channel analysis as an auxiliary phase in an enhanced global key search, rather 
than a stand-alone ‘win-or-lose’ attack. They present an algorithm for searching, 
based on probability distributions for each of the subkeys (derived from DPA 
outcomes) [25] . In the case of profiling DPA with Gaussian templates, the true 
leakage distributions conditioned on each subkey hypothesis are known, and the 
probabilities are naturally produced in the Bayesian template matching. In the 
case of non-profiling DPA, these conditional leakage distributions are not known; 
an attack does not produce a probability distribution on the subkey candidates 
but a set of distinguisher scores (for example, correlations) associated with each 
candidate. Deriving probabilities from these scores is tricky; the method sug- 
gested in [25] is to use the hypothesis-dependent fitted leakage models after a 
non-profiled linear regression (‘stochastic’) attack as estimates on the ‘true’ con- 
ditional distributions. However, non-profiled linear regression-based DPA specif- 
ically relies on the fact that the models built under incorrect key hypotheses are 
invalid. Consequently, the hypothesised functions do not describe the true data- 
dependent deterministic behaviour of the trace measurements, and so they are 
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useless for (statistical) inference. For this reason, we opt for a different (‘safer’) 
heuristic for assigning ‘probability’ scores, as explained in Section [3TT1 

Combining multiple sources of information. Whilst profiling attacks with multi- 
variate Gaussian templates [7] naturally exploit multiple trace points, notions of 
‘multivariate’ non-profiled DPA are varied in nature and intention. In particu- 
lar, techniques designed to defeat protected implementations are best considered 
separately from attempts to enhance trace efficiency, and we now focus on the 
latter. Already in an unprotected implementation, information on a given sub- 
key generally leaks via more than one target function (AddRoundKey, SubBytes 
and MixColumns, for example, in the case of AES) and moreover each of those 
target functions can be seen to leak at more than one trace point. In some cases, 
an adversary may even have opportunity to observe multiple side-channels si- 
multaneously (timing, power consumption, electromagnetic radiation. . . ). 

In the realms of both profiled and non-profiled DPA, several efforts have been 
made to combine information from multiple trace points in such a way as to 
optimise the (trace) efficiency of an attack. Dimensionality reduction techniques 
such as principal component analysis or linear discriminant analysis can be used 
to transform the (often collinear) trace measurements into a reduced number 
of linearly uncorrelated variables, together accounting for the important vari- 
ation in the original data [114122] . In this way it is even possible to combine 
information from different side-channels, such as power and electro- magnetic ra- 
diation [22] . Such methods can be very effective if the leakage associated with 
a particular intermediate value is concentrated into a single component giving 
rise to a stronger attack outcome than the ‘best’ of any individual point in the 
raw dataset. A recent work by Hajra et al. [12: achieves a similar end via sig- 
nal processing techniques. They show how to maximise the signal-to- noise ratio 
(SNR) (and consequently demonstrate the success rate of a univariate correla- 
tion DPA) by finding the linear Finite Impulse Response (FIR) filter coefficients 
for the leakage signal. Hutter et al. m also seek to enhance DPA efficiency by 
incorporating multiple sources of information, but take an entirely different ap- 
proach in which the combination is instead made at the trace acquisition stage. 
They measure the difference in consumption between two identical devices op- 
erating on different data, which they reason has a higher data-dependent signal 
because all environmental and operation-dependent noise is cancelled out. 

Other suggestions involve performing separate attacks (against different tar- 
gets, power models or using different distinguishers) and then attempting to 
combine the distinguishing vectors themselves in a meaningful way. Doget et al. 
U present options for combining difference-of-means (DoM) style outcomes in 
order to avoid the ‘suboptimality’ associated with attacks exploiting only one 
or a few of the bits at a time. Whit nail et al. [28] try applying a multivariate 
extension of the mutual information to the AddRoundKey and an S-box jointly, 
but find that it is less efficient than the corresponding attack against the S-box 
alone, and moreover would not scale easily beyond a two-target scenario due to 
the complex nature of the statistic. Souissi et al. m suggest to combine different 
distinguishers (namely, Pearson’s and Spearman’s correlation) applied against 
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the same or different leakage points by taking either the sum or the maximum 
of the two, and show that the former works better, and is most effective if the 
trace points contain non-equivalent information. Most directly related to our 
study is a paper by Elaabid et al. m which suggests to (pointwise) multiply 
correlation distinguishing vectors together in order to enhance distinguishing 
outcomes. They do this for four known leakage points, all relating to the same 
target function and power model, and find that it substantially improves over 
the outcomes achieved for any one of those leakage points taken individually. 
Our own combining approach is different: we first convert distinguishing vectors 
to ‘probability’ scores and view the multiplication as a Bayesian updating-like 
procedure. Moreover, we focus on combinations between different target val- 
ues (rather than different leakage points for the same value) with potentially 
different-sized subkey hypotheses. 

3 Methodology 

3.1 Assigning Probabilities 

The attempt of |25] to estimate ‘genuine’ probabilities on the subkey hypotheses 
in the non-profiled setting (see Section[2j), by using the recovered models derived 
from a linear regression based attacks, is expensive as well as unsuitable for 
our purpose. Ignoring the fact that the incorrect key hypotheses (using their 
approach) recover invalid models, the method of |25l26j may be viewed as one 
possible heuristic to assign probabilities to key guesses. It preserves the ranking 
of the keys as they appear in the distinguishing vector produced by a non-profiled 
linear regression-based DPA. However, because of the nature of the formula 
used it dramatically exaggerates the apparent distance between the high- and 
low-ranked key candidates. If the implied key is the right one it reinforces this 
‘correct’ result. But if it is not the right one it reinforces the misleading result. 
In their application (i.e. key enumeration) this may cause a less efficient key 
search. However, we are aiming to combine distinguisher results, and hence key 
rankings, and mixing in a grossly exaggerated incorrect key ranking may destroy 
the effectiveness of the method. 

Embracing the heuristic nature of the task of obtaining (from distinguishing 
vectors) scores which may be handled as though probabilities, we suggest the 
conversion be kept simple and conservative. Our approach firstly transforms the 
distinguishing vector to be positive- valued with a baseline of zero (in a manner 
appropriate to the statistic — e.g. the absolute value for correlation, subtraction of 
the minimum for the mutual information) before secondly normalising the scores 
to sum to one. We draw analogy between this idea and the notion of subjective 
probability basic to a Bayesian view of statistics: both involve human- allocated 
scores derived from one’s current best knowledge about reality. 

3.2 Combining Probabilities 

A Bayesian interpretation views probabilities as measures of uncertainty on hy- 
potheses. Each time new information becomes available, the current state of 
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knowledge can be updated via Bayes’ theorem: 

- ~P (I) ’ 

where H is some hypothesis (for example, a guess on the key, “K = fc”), and B 
is some data (for example, a set of trace measurements 1 = L o F (x) + e). 

Suppose that we have probabilities for (K = k) conditioned on two sources of 
data li, I 2 , which are conditionally independent given K = k so that P(U, I 2 1 K = 
k ) = P(li | K = k)FQ. 2 \K = k). This is a natural assumption for the leakages of 
two target intermediate values: they are related via their shared dependence on 
the underlying key, but as long as they are separated in the trace, we would 
not expect any dependency in the residual variances after the key is taken 
into account. In this case, the task of combining the conditional probabilities 
is straightforward (see E): 

P(^ = fc |l l! l 2 ) = P(1 ^ 2| ^^ = fc) 

_ P(li l-Fsf = k)¥(h\K = k)¥(K = k) 

~ nFh) 

_ P(li)P(l 2 ) w P (K = k\h)F{K = k\h) 

P(l!,l 2 ) X P (K = k) 


(via Bayes’ theorem again, since P(b| K = k) = P (K = k\\i)F(\i) /¥(K = k)). 
Since a = P(li)P(l 2 )/P(li, I 2 ) does not depend on the key hypothesis we can 
treat it as a normalisation constant which just needs to be computed so as to 
satisfy = fc|li , I 2 ) = 1. In the typical case that all keys are a priori 

equally likely, the denominator in the second product term is ^ (constant for 
all key hypotheses) and simply gets absorbed into the normalising constant. 
Thus, conditional probabilities on the key candidates can be updated with the 
introduction of any new, independent information via a simple multiplication- 
and- normalisation step. 

3.3 Parallelised Attack Architecture 

Combining multiple distinguishing vectors and attacking target functions involv- 
ing 24 or more bits of the key are both computationally demanding tasks, and 
necessitate the use of parallelised computation. We elected to use the OpenCL 
language and a set of graphics cards to parallelise the computation needed to 
attack up to 32-bits of a key, the combination and normalisation of distinguish- 
ing vectors, and finally the statistics necessary for evaluating the effectiveness of 
each combined attack. 

We took inspiration from modern HPC facilities, in which a significant amount 
of the computing power is delivered by GPUs. Hence our experimental setup 
consists of several (up to 6) workstations, each containing two discrete GPUs 
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Keys attacked per second, OpenCL kernel for attacking 32 bits of key using the MixColumns operation 


Intel i5 3550 @ 3.3 GHz, single core 

739 

2x Intel Xeon E5 2670 @ 2.6 GHz, 16 cores 

1 1,257,608 

2x AMD Radeon R9 290X 

s 

1 

o 

Intel i5 3550 @ 3.3 GHz, single core 

295 

2x Intel Xeon E5 2670 @ 2.6 GHz, 16 cores 

1 498,627 

2x AMD Radeon R9 290X 


S' 


2x AMD Radeon HD 7970 GHz Edition 


2x AMD Radeon HD 7970 GHz Edition 





Fig. 1 . Average keys per second recorded during DPA attacks on 32-bits of the input 
to the MixColumns operation for a variety of different sample sizes. Implementations 
are a ‘naive’ single-threaded CPU implementation, a parallelised OpenCL CPU-based 
implementation, and the two fastest OpenCL GPU implementations. 


(the cost per machine is approximately 2000 GBP). These were various pairs of 
high-end AMD and Nvidia cards, installed in our own workstations or within 
the Bluecrystal Phase 3 supercomputing facilit}@. In total, including all the 
functionality used to fully produce and analyse our experimental results, we were 
able to complete at least 2 50 operations on combined distinguishing vectors, in 
very roughly a couple of weeks of computation time. 

The most computationally demanding function was performing a 32-bit DPA 
attack on the MixColumns operation. Here we decided to share the cost over 
multiple GPUs, with each work group inside a single card computing a partial 
piece of the distinguishing vector using a portion of the traces and a subset of 
the key hypotheses, followed by a global reduction to compute the final vector. 
Fig.[l]shows the performance of our OpenCL attack implementation for a variety 
of devices, in terms of the number of key hypotheses tested per second. 

We note that these benchmark timings are not likely to be optimal. We did not 
try to improve the memory coalescence of our kernels, nor did we try to perform 
any other non-trivial optimisation beyond maximising kernel occupancy, and so 
there may be considerable headroom in key-search throughput still to be gained. 
It is clear from the extremely cheap price for a dual GPU setup, coupled with the 
considerable performance increases observed with the introduction of new GPU 
architectures, that an adversary can acquire very large side-channel key-search 
capabilities at minimal financial cost. 

Bartkewitz et al. [3i use Nvidia’s CUDA technology and a Tesla C2070 to 
parallelise 8-bit CPA attacks on the SubBytes operation, and focus on maximis- 
ing trace data throughput in an 8-bit setting. Our more ambitious goal is to 
optimise for large key-search problems as well as for trace data throughput. In 
this context Moradi et al. m utilise 4 Nvidia Tesla GPUs to attack 32-bits of 
key using 60, 000 traces, and are able to attack a single time-point every 33 min- 
utes. A direct comparison is not possible as we are using slightly more modern 
hardware and the exact computational costs included in the benchmarking are 
not clear — however we might expect to be able to perform a similar attack in 
approximately 20 minutes. 

2 Bluecrystal is managed by the Advanced Computing Research Centre at the Uni- 
versity of Bristol — see http : / / www . bris . ac . uk/ acre/ 
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4 Experiments with Simulated Data 

The goal of our combining strategy is to reduce (relative to ‘standard univariate 
DPA’) the guessing entropy on the subkeys (and consequently on the global 
key). Many types of combination are possible. We study the effect of combining 
outcomes from different targets as well as, secondarily, the effect of combining 
outcomes from different distinguishers applied to the same target. We do this 
initially for simulated trace measurements so that we can take into account 
different noise levels (i.e. by varying the SNR) as well as the impact of using an 
imperfect power model. Both aspects have practical relevance. 


4.1 Combining Outcomes from Different Targets 

We simulated leakages of AES AddRoundKey, SubBytes, and three 8-bit interim 
values in the computation of MixColumns: one involving two key bytes (namely 
GFm2(statei ® state^ j) where statei is the i th state byte after the SubBytes 
operation, and GFm2 denotes doubling in Rijndael’s finite field), one involving 
three key bytes (namely GFm2 (statei (B statei+i) statei+i(B states) , and one 
involving four key bytes (namely GFm2 (statei (B statei+i) (B statei+i® statei+ 2 ® 
state j +3 )H 

In the case of the 16-bit multi-target attack we necessarily hypothesise over 
two key bytes (in order to incorporate the MixColumns leakage). The experi- 
ments each involve two AddRoundKey correlation-based DPA attacks (which are 
then combined into probabilities on the full 16-bit subkeys via multiplication), 
two S-box attacks (combined likewise), and the one MixColumns attack, before 
multiplying each possible target function pair together, as well as multiplying 
all three together. Similarly, for the 24-bit mult i-tar get attack we hypothesise 
over three key bytes. The experiments in the 24-bit attack then involve three 
AddRoundKey attacks, three S-box attacks, and the one attack on an interim 
MixColumns value. We amalgamate probabilities by multiplication as in the 
16-bit case. The 32-bit multi-target attack proceeds in the same fashion: we 
combine four AddRoundkey attack results and four SubBytes results into the 
MixColumns attack result. The graphs in Fig. |2(a)| show these different scenarios 
for a single column of the AES state. 

In the following paragraphs we analyse these graphs with respect to three 
questions that are relevant for practice. Firstly, what is the impact of a (low) 
SNR with regards to our multi-target strategy? As we base our DPA attacks on 
correlation distinguishers, we would hope that, similarly to single-target attacks, 
multi-target attacks will ‘scale’ alongside the SNR. Secondly, we are interested in 
how the size of the key hypotheses impacts on the guessing entropy, and lastly, in 
how multi-target attacks behave when the attacker’s power model is imprecise. 

3 This targets a single intermediate byte. The relative effectiveness of combining all 

four attacks on all the possible intermediate bytes would also be interesting to in- 
vestigate, but generating results requires time and so is left as future work. 
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16-bit targets, SNR = 1.0 




16-bit targets, SNR = 0.0625 



32-bit targets, SNR = 0.0625 



Number of traces 


| S-box . — . — . AddRoundKey + MixColumns S-box + MixColumns — — — AddRoundKey + S-box - All three | 

(a) Outcomes for attacks combining several targets using up to 32-bit key hypotheses 


Imprecise power-model, 16-bit targets, SNR = 0.0625 SNR = 0.0625 




(b) Outcomes for attacks combining several distinguishers for the same target (the 
S-box output) 


Fig. 2. Simulation results 


Impact of SNR. The top two graphs in Fig. |2(a)| show the subkey guessing 
entropies (for a 16-bit key guess) as the number of traces increases, for the attacks 
against simulated Hamming weight leakages with two SNR levels. Aside from the 
fact that all attacks require increased numbers of traces as the SNR decreases (as 
we would expect) the scenarios exhibit similar outcomes. The attacks on S-boxes 
are effective at reducing uncertainty on the key (the results for these are printed 
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in red), but are clearly outperformed by all three ‘bivariate’ combinations — even 
the one between the MixColumns sub-computation and AddRoundKey. The 
combination between all three further reduces the enumeration work required. 


Impact of larger distinguishing vectors. The top right and the bottom graphs in 
Fig. 2 (a) | show the subkey guessing entropies for increasing subkey sizes (16-bit 
in the top right, 24-bit in the bottom left, and 32-bit in the bottom right). In all 
three experiments the mult i-tar get attacks outperform the single target attacks. 
Note that the guessing entropy range naturally increases with the size of the 
key hypothesis and is in no way an indicator of attack degradation. For the 
16-bit attack the guessing entropy is out of 2 16 and eight such guesses need to 
be combined to get a global key with guessing entropy between 1 and 2 128 . For 
the 32-bit attack the guessing entropy is out of 2 32 but only four such guesses 
need to be combined. It is the global guessing entropy which ultimately matters 
and the subkeys always need to be combined at some point - incorporating 
information at (e.g.) the 32-bit level simply increases the scope of intermediate 
targets exploitable by the attacker. For both hypothesis sizes, the outcomes 
suggest that we are able to succeed with roughly half the number of leakage traces 
when using the best multi-target attack (for a fixed subkey guessing entropy, the 
best multi-target attacks require roughly half of the traces required by the best 
single-target attack). It is possible to estimate global key guessing entropies 
based on these results by assuming that the attacks on the other ‘chunks’ of 
the key would behave identically. For instance, in the 16-bit case, if all eight 
16-bit attacks give identical outcomes, we could estimate global key entropies 
by raising the results of a single 16-bit attack to the power eight. However, this 
does not necessarily translate into practice, so we will instead show actual global 
key guessing entropies when we come to discuss attacks on real data. 


Impact of imperfect power model. The left picture of Fig. 2(b)| shows the out- 
comes (against a 16-bit subkey target: the legend from Fig. 2 (a) | applies) in the 
case where the Hamming weight is not a perfect match to the leakage, because 
of the presence of a constant reference state (representing an address, for exam- 
ple) of Hamming weight 1. The most striking impact of this distortion occurs for 
attacks that include AddRoundKey as a target, which are no longer able to iden- 
tify the correct key as a likely candidate. This is because the Hamming distance 
of the AddRoundKey from the reference state when the correct key is guessed 
is the same as the Hamming weight of the AddRoundKey when the key guess 
is the correct key XORed with the reference state. In effect, an incorrect key is 
masquerading as the correct one, and the correlation DPA against AddRound- 
Key will naturally preference this. (The same cannot happen for the S-box, for 
example, because the key XOR is inside the highly nonlinear transformation, 
with the Hamming distance being taken afterwards). 

Nonetheless, in this case where the reference state is itself of low weight, in- 
corporating AddRoundKey information still produces marginal reductions on 
the guessing entropies after S-box and MixColumns (separately, and combined). 


254 


L. Mather, E. Oswald, and C. Whitnall 


Greater imprecision of the power model will more strongly impact on AddRound- 
Key attacks; it may be advisable to exclude it as a target in such cases. 

4.2 Combining Outputs from the Same Target 

One might ask whether or not the outcomes of different distinguishing statistics 
or power models can likewise be combined to some advantage. 

Using different distinguishing statistics. Suppose we run three different attacks 
against the leakage of an AES S-box, e.g.: mutual information m , Kolmogorov- 
Smirnov m ], and the variance ratio [23], all using a Hamming weight power 
model. The distinguishing vectors are transformed to have a baseline of zero and 
to sum to one, for use as heuristic ‘probability’ scores. We would then like to 
know whether the combined outcomes improve upon the individual ones. 

The right picture in Fig. |2(b)| shows what happens when we attempt this in the 
example scenario of Hamming weight leakage with SNR 0.0625. When the same 
measurements are used for all of the attacks, combining the outcomes actually 
increases the guessing entropy. By contrast, when independent measurements are 
used in each case (i.e., each distinguisher has been applied against a different 
point in the trace leaking the same information but with independent noise), 
there is some scope to refine the information on the key by combining outcomes — 
although all three outcomes together on average produce worse results than the 
best combination of two. We found that it was generally the addition of mutual 
information which degraded the outcome, as it required substantially more data 
to estimate to an equivalent degree of precision. 

This is very much in line with what we might expect, and acts as a note- 
worthy warning: it is the addition of new information which improves attack 
outcomes — exploiting the same measurements using the same power models but 
with different distinguishers does not contribute anything further. In the context 
of our heuristic ‘probability’ distributions such a practice could be particularly 
dangerous, as it still serves to exaggerate the magnitude of the peaks, thus giving 
a false sense of increased certainty. Note that the multiplication step implicitly 
assumes independence of the separate score vectors, which is clearly violated in 
the case that they are all based on the same leakage information. 

Using different power models. In the light of the ineffectiveness of combining 
information about the same target, we briefly revisit previous work by Bevan 
and Knudsen [5]. They suggest to combine eight difference-of- means attacks, 
each targeting a distinct bit of the intermediate value, by ‘summing over the 
distinguisher results’ (in our approach we convert them into ‘probability’ distri- 
butions on the set of 2 8 subkeys, as per Section ETTl) . Since each attack exploits 
a separated portion of the overall leaked value we may expect that each new 
bit attacked helps to further reduce the candidate search space — and, indeed, 
our experiments confirm this (see Appendix [A]) . Such a technique is hence very 
useful in leakage scenarios which are unfamiliar to an attacker, which is often 
the case when attacking dedicated hardware. 
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5 Practical Attacks 

We tested our strategy in practice using a dataset of 10,000 traces from an 
ARM7 microcontroller running an unprotected implementation of AES. The 
10,000 traces were divided up in 200 sets of 50 traces each to conduct suffi- 
cient repeat experiments to report reasonably precise estimates for the guessing 
entropies in the same vein as our simulated attacks. Multi-target attacks, sim- 
ilar to multivariate attacks, are greatly helped by knowledge about where the 
attacked intermediate values leak in the traces. Consider for instance a (multi- 
variate) template attack: it is much harder for an adversary to conduct such an 
attack when in the profiling phase a similar device is available but not the exact 
implementation (of, say, AES). In such a case an adversary could still build tem- 
plates for microprocessor instructions during profiling, but in the attack phase 
the adversary would need to find the specific trace points at which to apply the 
templates. Similarly, knowing precisely where the single-target leakages occur is 
helpful for a multi-target attack. We consequently focus initially on a ‘known 
point’ scenario and then make a first attempt at relaxing this assumption. 


5.1 Practical Attacks against Known Interesting Points 


We applied two multi-target attacks (one involving 16-bit, one 32-bit key hy- 
potheses) under the assumption that interesting trace points are known, running 
200 repeat experiments for increasing samples of up to size 50. For each 16-bit 
subkey, correlation DPA attacks were performed against the two corresponding 
AddRoundKey operations, the S-boxes and the MixColumns sub-computation 
GFm2(statei ® state^ i), where state i is the state byte corresponding to the 
i th key byte after the S-box substitutions and (in this implementation) the 
ShiftRows operation. For each 32-bit subkey, correlation DPA attacks are per- 
formed against four AddRoundKey operations, four S-boxes, and the 32-bit Mix- 
Columns computation GFm2(statei ® statei+i) ® state^i ® state ^ 2 ® states 3 . 

The first two graphs in Fig. |3(a) show the guessing entropies on the first 
key- byte pair and the final global guessing entropies, estimated by multiplying 
the eight subkey guessing entropies together (the outcomes for the other seven 
subkey guesses can be found in the Appendix of the full version of our paper, 
see mm They largely, but not perfectly, match up with our observations for 
simulated traces. This is an important point: theory and practice rarely perfectly 
align, even in the case of a relatively ‘simple’ platform like the ARM7. In the 
practical experiments, AddRoundKey and the MixColumns sub- value are con- 
sistently unable to identify the correct key alone (at least, not within 50 traces). 
However, the two together produce guessing entropies to rival the effectiveness 
of the S-box attack, and both produce improvements in combination with the 
S-box. All three together produce the best guessing entropies for many of the 


4 The more refined rank estimation methodology of [261 indicates that this simple 
method of approximating global guessing entropies underestimates the rank by 20 
to 40 binary orders of magnitude. 
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16-bit subkeys, although they are sometimes outperformed by the two-target 
S-box® AddRoundKey attacks, which achieve a marginal advantage overall. 

The second two graphs in Fig. 3(a) show the guessing entropies for the first 
32-bit subkey and the final global guessing entropies. The global entropies were 
estimated by multiplying the four subkey entropies together. We observed vary- 
ing behaviour for our combined attacks on different subkeys; our targeted Mix- 
Columns computation does not leak nearly as much information in the middle 8 
bytes of the state as it does in the first and final 4 bytes. Consequently, despite 
(as suggested by our simulated attacks) observing strong performance of the 
combined three-target attacks in the latter two cases, in the global setting this 
advantage is diminished, and the ‘trivariate’ attack produces similar performance 
to the combined four-byte S-box® AddRoundKey attacks. It is noteworthy that 
even in the presence of this variable leakage, most combined attacks outperform 
the S-box attack. Graphs and data for each of the four separate subkey attacks 
can be found in the Appendix of the full version of our paper El- 


5.2 Practical Attacks where Interesting Points Are A Priori 
Unknown 

The natural next question to ask is whether we can relax the assumption that 
the leakage points are precisely known. We made some preliminary inroads using 
‘desktop- level’ resources (whilst our GPU machines were occupied with other 
experiments), focusing, for computational feasibility, on 8-bit key hypotheses. 
The three targets we selected to combine were AddRoundKey, the S-box outputs, 
and the interim MixColumns value GF m2 (stated st at e^i) with the assumption 
that the second involved key byte of the two is known. 

We relaxed the ‘known point’ assumption by visually inspecting the AES 
traces in order to identify the intervals in which each of the three target functions 
are contained. The first round takes about 1,400 clock cycles in total and the 
(non-overlapping) windows we selected for experimentation were of widths 240, 
230, and 180 for AddRoundKey, SubBytes and MixColumns respectively. Within 
these windows we took an ‘exhaustive search’ approach. First, we subjected 
each point to a standard DPA attack against the associated target function, and 
computed the ‘probability’ scores. We then pairwise combined them in each of 
the three possible configurations, and finally we combined all three. We tried 
two strategies: in the first, we took (for each configuration) the combined vector 
with the largest peak as the one most likely to correspond to the correct key and 
pair /triple of leakage points, and in the second we took the N t combined vectors 
with the largest peaks and multiplied these together (for different values of Nt), 
so achieving a sort of ‘majority vote’. 

The left side of Fig. |3(b)| shows the average guessing entropy for each of the 
attacks using the first ‘maximum peak’ strategy. The AddRoundKey attack in 
an unknown point scenario performs very badly. Further analysis of the trace 
window reveals that there are other points exhibiting strong correlations with 
AddRoundKey® R, for R some other (possibly address?) value in {0,255} (see 
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S-box . — . — . AddRoundKey + MixColumns S-box + MixColumns — — — AddRoundKey + S-box All three | 

(a) Outcomes for multi-target attacks in a known points scenario 



(b) Outcomes for multi-target attacks in a known interval scenario 
Fig. 3. Practical results 


Fig. [5] in Appendix 00 Moreover, at these points the correct key correlations 
are low , so that the contribution to the combined leakage is highly distorting (as 
opposed to when an ‘imperfect but close’ leakage prediction is made, in which 
case the combination can still improve distinguishability) . In the presence of 
such misleading leakage information, it is reassuring that the attack outcomes 
are robust to the combining step. 


5 Note that the leakage of the S-box is less vulnerable to such distortions: a non-zero 
reference state will not masquerade as an alternative key hypothesis, as the key 
addition happens inside the S-box. 
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The combined MixColumns and S-box attack exhibits lower guessing entropies 
than either of the two taken individually. The trivariate attack (as expected 
from the above) does not really add much to this, but again we reflect that the 
inclusion of AddRoundKey at least does not seem to harm the outcome. 

The right side of Fig. 3(b) | shows the advantage gained by multiplying the 
top-ranked few ‘probability’ vectors for the trivariate attack, as well as (for 
comparison) for the S-box attack on its own. Interestingly, even the addition 
of the second ranked vector degrades the S-box attack, whereas the product 
combining for the top-ranked triples reduces the guessing entropy at least up 
to N t = 20. The subsequent total improvement over the S-box outcome on its 
own indicates this as a potentially worthwhile strategy for key recovery in an 
unknown point scenario. 

From a practical perspective, a useful forward approach for multi-target at- 
tacks would be to ‘try out’ (for a concrete device and implementation) different 
combinations of targets, and different point selection strategies, to see which give 
the best results. We want to caution against drawing too many conclusions from 
these last experiments: they clearly represent a first step only! 


6 Conclusion 

We have shown how to amalgamate single-target ‘standard’ DPA attacks (us- 
ing a correlation distinguisher and a Hamming weight power model) into multi- 
target attacks capable of increasing information on the correct key by combining 
DPA outcomes that are treated as heuristic probabilities. Leveraging our mod- 
ern HPC-inspired computing platform, we are able to efficiently handle key hy- 
potheses of up to 32 bits using a small cluster of simple workstations containing 
consumer graphics cards. Such a capability allows us to combine many inter- 
mediate targets; in this work we made the first serious attempt to explore the 
characteristics of successful combinations. Our results indicate that combining 
S-box+ AddRoundKey or additionally including an intermediate MixColumns 
computation typically produces the strongest results. Multi-target attacks scale 
predictably with noise and are robust with regards to imprecise power models. 
Our primary investigative effort is mainly on ‘known’ (leakage) point attacks, 
in line with assumptions generally made for multivariate attacks. When leakage 
points are not known, an exhaustive search in suitable visually-identified trace 
windows, together with a ‘majority vote’-style approach to decide on ‘peaks’, 
leads to improved practical attacks even in this challenging scenario. 

Our definition of multi-target attacks and intuitive and efficient combination 
technique opens up many interesting new research questions: e.g. is there any 
single best combination of intermediate values for a given cipher? How effectively 
can we combine power and EM attack results in this way? Could we even move 
further on and include results from the second encryption round? What other 
strategies for combining in unknown point scenarios exist? How could we use 
this against implementations when masking and hiding are used? For better or 
worse, these are “interesting times” — to call to mind the fabled Chinese curse. 
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A Combining Difference-of-Means Outcomes 

Fig. [4] shows the reduction in subkey guessing entropy as an increasing number 
of difference-of-means (against different individual bits) are combined via our 
strategy. 
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Fig. 4. Combining the outcomes of up to eight difference-of-means attacks against 
Hamming weight leakage of the AES S-box 


B Unknown Point Attacks: Problem of Rival Peaks 

Fig. [5] illustrates the difficulty of separating the true key from strong rival can- 
didates when the relevant ‘interesting points’ in the trace are not known. As de- 
scribed in Section O this introduces distorting information into the point search, 
which reduces the ability to increase an attack’s effectiveness by the addition of 
AddRoundKey outcomes. 



Time index 



Fig. 5. Left: Example of a fixed XOR offset from the key producing a rival peak in the 
AddRoundKey correlation attack against the ARM7 traces. Right: The evolution of an 
AddRoundKey correlation attack against the ARM7 traces, showing the confounding 
effect of strong rival candidates. 


GLV/GLS Decomposition, Power Analysis, 
and Attacks on ECDSA Signatures 
with Single-Bit Nonce Bias 


Diego F. Aranha 1 , Pierre- Alain Fouque 2 , Benoit Gerard 3,4 , 
Jean-Gabriel Kammerer 3,5 , Mehdi Tibouchi 6 , and Jean-Christophe Zapalowicz 7 

1 Institute of Computing, University of Campinas 

dfaranha@ic.unicamp.br 

2 Universite de Rennes 1 and Institut Universitaire de France 
fouque@irisa.fr 
3 DGA-MI, Rennes 

4 IRISA, benoit .gerard@irisa.fr 

5 IRMAR, Universite de Rennes 1 
j ean-gabriel . kammerer@m4x . org 

6 NTT Secure Platform Laboratories 
tibouchi .mehdi@lab.ntt .co.jp 
7 Inria 

j ean-christophe . zapalowicz@inria . f r 


Abstract. The fastest implementations of elliptic curve cryptography 
in recent years have been achieved on curves endowed with nontriv- 
ial efficient endomorphisms, using techniques due to Gallant-Lambert- 
Vanstone (GLV) and Galbraith-Lin-Scott (GLS). In such implementa- 
tions, a scalar multiplication [k\P is computed as a double multiplication 
[ki\P + [fe ]'0(P), for an efficient endomorphism and &r,/c 2 appropri- 
ate half-size scalars. To compute a random scalar multiplication, one 
can either select the scalars /ci ,&2 at random, hoping that the resulting 
k = k\ + foA is close to uniform, or pick a uniform k instead and decom- 
pose it as ki + foA afterwards. The main goal of this paper is to discuss 
security issues that may arise using either approach. 

When A* and k^ are chosen uniformly at random in [0, ^/n), n = 
ord(P), we provide a security proofs under mild assumptions. However, 
if they are chosen as random integers of |_§ log 2 n\ bits, the resulting k is 
slightly skewed, and hence not suitable for use in schemes like ECDSA. 
Indeed, for GLS curves, we show that this results in a bias of up to 1 
bit on a suitable multiple of k mod n , and that this bias is practically 
exploitable: while lattice-based attacks cannot exploit a single bit of bias, 
we demonstrate that an earlier attack strategy by Bleichenbacher makes 
it possible. In doing so, we set a record by carrying out the first ECDSA 
full key recovery using a single bit of bias. 

On the other hand, computing k± and /C 2 by decomposing a uniformly 
random k £ [0, n) avoids any statistical bias, but the decomposition al- 
gorithm may leak side-channel information. Early proposed algorithms 
relied on lattice reduction and exhibited a significant amount of timing 
channel leakage. More recently, constant-time approaches have also been 
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proposed, but we show that they are amenable to power analysis: we de- 
scribe a template attack that can be combined with classical lattice-based 
attacks on ECDSA to achieve full key recovery on physiscal devices. 

Keywords: Elliptic Curve Cryptography, GLV/GLS Method, Bleichen- 
bacher’s ECDSA Attacks, Side-Channel Analysis. 


1 Introduction 

The GLV /GLS Techniques. Many record implementations of elliptic curve 
cryptography in software, including, most recently, works such as [2715110] , rely 
on elliptic curves endowed with fast endomorphisms, as constructed by the 
methods due to Gallant-Lambert-Vanstone (GLV) [15], Galbraith-Lin-Scott 
(GLS) |13] , and generalizations thereof. In such implementations, the fast endo- 
morphism ^ on the elliptic curve E/¥ q is used to speed up full size scalar mul- 
tiplications [k\P by computing them as multi-exponentiation [ki]P + \k^{ P), 
where k\ and &2 are roughly half of the size of k. Indeed, on a prime order sub- 
group of E(W q ), i/j acts by multiplication by some constant A, and thus, for a 
generator P of that subgroup, we have [k\\P + [k 2 \^(P) = [k% + feAjP. 

In order to compute random scalar multiplications with those techniques, 
two types of approaches have been considered, as far back as in the earliest 
presentations of the GLV method (such as Gallant’s talk at ECC’99 [14]). 

On the one hand, k\ and k 2 can simply be chosen uniformly at random in 
a suitable half-length interval. This approach, which we call the recomposition 
technique (since k is “recomposed” as k = k\ + feA), results in a very simple 
implementation, and has been used in several implementation records includ- 
ing [27], but Gallant expressed concerns about possible biases in the resulting 
scalar k. Such concerns have been partially vindicated by some numerical ev- 
idence provided by Brumley and Nyberg [7], who also described a relatively 
general way to choose intervals for k\ and k 2 so that the resulting choice of k 
is in fact secure (in the sense that it has high entropy). However, the Brumley- 
Nyberg method is a bit cumbersome, and no attack so far has been demonstrated 
against arbitrary half-length uniform choices of k\ and & 2 , so that the security 
picture is somewhat unclear. 

On the other hand, one can also pick k at random and subsequently deduce 
half-length values k\ and &2, which eliminates concerns regarding possible biases 
in the distribution of k. This decomposition technique usually relies on lattice 
reduction in dimension 2 (or equivalently, continued fractions, a generalized Eu- 
clidean algorithm, etc.), as originally described in the GLV paper m, and is 
significantly more computationally demanding than recomposition. Simplifica- 
tions of this method have later been proposed (particularly in (28]), as well as 
higher-dimensional generalizations [25] to tackle decompositions involving sev- 
eral endomorphisms (as recently used in mm for instance). 


ECDSA Attacks. The success of GLV/GLS method in implementations lately 
makes it desirable to reconsider these decomposition and recomposition 
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techniques from a security viewpoint. We do so in this paper in the context of 
ECDSA signatures, one of the most widely deployed elliptic curve cryptographic 
schemes, and an interesting target for the cryptanalyst (like other Schnorr-like 
signature schemes) due to its sensitivity to biases in the distribution of nonce 
values, as demonstrated by the powerful attack due to Howgrave- Graham and 
Smart m based on lattice reduction techniques, which breaks (EC)DSA when 
some of the most significant bits of the nonces are known. This attack was an- 
alyzed in further details by Nguyen and Shparlinski [231241 and carried out in 
practice in many contexts, including against physical devices (see e.g. |22[6] for 
some examples). The basic idea is to express the key recovery problem as an in- 
stance of the Hidden Number problem (HNP), which reduces to the closest vector 
problem (CVP) in a suitable lattice. Since CVP is tractable in low-dimensional 
lattices, many practical instances of ECDSA can be broken depending on key size 
and the number of leaked nonce bits. The largest problem instance broken so far 
is the case of 2-bit nonce leaks on 160-bit curves, tackled by Liu and Nguyen m 
using the most advanced known techniques for lattice reduction (BKZ 2.0 [9]). 
Breaking 2-bit leaks on 256-bit curves, or 4-bit leaks on 384-bit curves seems 
currently out of reach (see the discussions in [9l21j b 

In any case, there is a hard limit to what can be achieved using lattice reduc- 
tion: due to the underlying structure of the HNP lattice, it is impossible to attack 
(EC)DSA using a single-bit nonce leak with lattice reduction. In that case, the 
“hidden lattice point” corresponding to the HNP solution will not be the closest 
vector even under the Gaussian heuristic (see [26]), so that lattice techniques 
cannot work. To break this “lattice barrier” , the only known alternate attack is 
an algorithm due to Bleichenbacher [3j which predates the attack of Howgrave- 
Graham and Smart, but was generally considered of mostly theoretical interest 
until it was recently revisited by De Mulder et al. m to attack 384-bit curves. 
Bleichenbacher devised his attack to demonstrate a vulnerability in DSS at the 
time, in which DSA nonces were generated by picking a random value of i n bits, 
where i n is the bit length of the group order n, and then to reduce it modulo 
n. Bleichenbacher showed that the resulting bias could be exploited in a very 
interesting way, obtaining a key recovery using about 2 41 signatures and about 
2 47 time and 2 41 memory complexities. At that time, it was not possible to 
mount this attack and only simulations on reduced numbers were possible and 
the paper was never published. 

In the first stage, Bleichenbacher’s algorithm reduces the signatures from 160 
bits to say 40 bits using linear combinations of the original signatures and then, 
during a second phase, a Discrete Fourier Transform is used to recover the most 
significant bits of the secret key. The bias of the reduced signatures is higher 
than the bias of the original signatures, that’s the reason why Fourier technique 
is needed to extract this information. This algorithm is very similar to Blum, 
Kalai and Wasserman algorithms [4118] for solving LPN and LWE problems. 
For 384-bit order, the first stage of Bleichenbacher original attack is not suffi- 
ciently efficient to reduce the signatures and more advanced techniques based 
on LLL and BKZ are needed if the number of leaked bits is high enough [21] . 
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The modification of the first stage is not possible if less than one bit of nonces 
is available and we turn back to Bleichenbacher’s original attack which requires 
a high number of signatures. 

Our Contributions. Our first contribution is the first implementation of Ble- 
ichenbacher’s attack against ECDSA with a single-bit on nonce bias. We carry 
out this attack on the standardized SECG P160 R1 elliptic curve. On this 160- 
bit curve, we use 2 33 ECDSA signatures, and achieve a full key recovery in a few 
hours of wall-clock time on a 64-core workstation. The most time-consuming 
part of the attack is the first phase, in which a sorting algorithm is executed 
several times. This is the first key recovery from a single bit of bias, which 
paves the way to new applications. We stress again that this record cannot be 
achieved using lattice reduction techniques based on HNP problem, since even if 
the HNP lattice satisfies the Gaussian heuristic, a condition for finding the hid- 
den lattice point is that the number of known bits of the nonce must be greater 
than log 2 (v / 7re/2) « 1.0471 (hence at least 2) [26] . irrespective of the underlying 
lattice reduction algorithm. 

As a second contribution, we show a security proof for the recomposition 
method on curves obtained by the quadratic GLS method once the values k± 
and &2 are uniformly distributed in the interval [0 , ^/n), where n is the prime 
group order. We prove that the statistical distance between this distribution 
and the uniform distribution in [0, n) is negligible. Furthermore, if k\ and k<i are 
taken at random in a small interval of the form [0,2 m ), where m — |_^log 2 n_|, 
the bias on the distribution on k used in Bleichenbacher’s attack is negligible. 
However, we show that the bias of the distribution on tk where t is the trace is 
sufficiently large and a Bleichenbacher’s attack allows to recover the secret key. 
We also implement this attack and the complexities are similar to the previous 
part. 

Finally, we study the decomposition technique proposed in GLV with the 
implementation described by Park et al. in [28] . To this end, we propose a very 
efficient side-channel attack that uses the leakage on the multiplication in order 
to recover some of the least significant bits of the nonces. Consequently, we can 
thus use lattice techniques to recover the secret key. 

2 Preliminaries 

2.1 Bias Definition and Properties 

The measurement of the bias of random variables represents a significant part 
of our analyses. We thus recall the definition of the bias which was proposed by 
Bleichenbacher in [3]- 

Definition 1. Let X be a random variable over TLjnTL. The bias B n (X) is de- 
fined as 

B n (X) = E{e 2niX/n ) = B n (X mod n), 
where E(X) represents the mean. 
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Similarly, the sampled bias of a set of points V = (fi,--- ,vl) in TLjnTL is 
defined by 


1 L_1 

B n (V) = - ^ e 2 ™^ n . 

3 = 0 


The bias as defined above presents some useful properties we recall in Lemma[lJ 


Lemma 1. Let 0 < T < n be a bound and X,Y random variables uniformly 
distributed on the interval [0 ,T — 1]. 


(a) If X is uniformly distributed on the interval [0,n — 1], then B n {X) = 0. 

(b) If X and Y are independent, then B n (X + Y) = B n (X)B n (Y). 

(c) B n (—X) = B n (X) where a denotes the conjugate of a. 


(d) B n (X) = i 


sin(7rT/n) 

sin(7r/n) 


and B n (X) is real-valued with 0 < B n (X) < 1. 


(e) Let a be an integer with \a\T < n and Y = aX , then B n (Y) = ^ S ^n™a/n) 


2.2 ECDSA Signature Generation 

ECDSA is a NIST standard and we describe the signature generation in 
Algorithm [lj 


Algorithm 1. ECDSA signature. P is a base point of order n and H : {0, 1}* — >• 
[0, n— 1] is a cryptographic hash function. The private key is an element x G TL/nTL 
and the public key is denoted by (p, n , H , P, Q ) with Q = [x\P. 

1: function SigNecdsa(ut) 

2: k A [0, n — 1] 

3: (u, v) [k\P 

4: r u mod n; if r = 0 then goto step 2; 

5: s fc -1 (P(ra) + rx) mod n; if s = 0 then goto step 2; 

6: return (r, s) 

7: end function 


3 Bleichenbacher’s Attack on Single Bit Bias 

In this part, we present our results on an ECDSA signature generation scheme 
where the nonce k is 1-bit biased. We demonstrate that an attack proposed some 
years ago by Bleichenbacher can succeed in retrieving the secret key in about 
2 37 time and 2 33 memory complexities given 2 33 signatures, for 160-bit order. 
This attack was initially focusing on the DSA signature generation scheme but 
can be applied without any modification to ECDSA we consider in this paper. 

The main idea consists in using the fact that the nonces kj are chosen from a 
biased random variable K, i.e. k are not randomly and uniformly generated on 
[0,n — 1]. Because the values kj are biased and linked with the secret key x by 


GLV/GLS Decomposition, Power Analysis, and Attacks 267 


the equations which are used for the signature computations, these signatures, 
correctly manipulated, also present a bias which will only be significant for the 
correct value of x. In other words the bias plays the role of the distinguisher in 
this attack. 

Obviously, for cryptographic sizes, evaluating the bias for all values in [0, n— 1] 
is impractical. However, Bleichenbacher observed that it is possible to ” broaden 
the peak” of the bias in such a way that, with a value close the correct value of x, 
the bias will remain significant. Thus the bias computations can be performed on 
a more sparse set of candidates thanks to the Fast Fourier Transform. In return, 
it requires a non-negligible work on the signatures which reduces the bias, and 
the attack returns an approximation of the secret key, i.e. its most significant 
bits. The attack can be iterated to retrieve more bits of the secrets and as soon 
as sufficiently many bits of x are known, Pollard’s lambda method [29] can be 
used to derive the remaining bits. Algorithm [2] presents the main steps of the 
attack. 


Algorithm 2. Bleichenbacher ’s attack given S ECDSA signatures. The param- 
eters S, £ and i have to be chosen accordingly to the bias. 

Require: S biased ECDSA signatures (r? , Sj ) computed using a single secret key x. 
Ensure: The i most significant bits of x. 

1: Preprocessing 
2: for j = 0 to S — 1 do 

3: hj ^r- H(rrij) • s -1 mod n 

4: Cj <— rj • sj 1 mod n 

5: end for 

6: Reduction of the c 3 values (Sort-and-Difference Algorithm) 

7: A <- [(cj, hj)]o<j<s-i 

8: for i — 1 to i do 

9: Sort A by the Cj values > cj < Cj + 1 

10: for j = 0 to S — i do 

11: A[j] <r- A[j + 1] - A[j\ > A[j] = (c j+1 - Cj,h j+1 - hj) 

12: end for 

13: end for 

14: Only keep the pairs (cj, hj) such that Cj < 

15: Denote by L the number of such pairs 

16: Bias computation using the inverse FFT 

17: Z i — (0, • • • ,0) a vector of size 2 £ 

18: for j = 0 to L — 1 do 
19: Z C j <— Z Cj + e 2l * ih i /” 

20: end for 

21: W iFFT(Z) > Inverse FFT computation. The output is also a vector of 

complex numbers. 

22: Find the value m such that \Z m \ is maximal 
23: return msb^mn/ 2 £ ) 
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3.1 Attack Analysis 

We first explain why the bias can serve as a distinguisher and while doing so 
explain the goal of the preprocessing phase, as it was done in [21], for the sake 
of completeness. For that purpose, consider S ECDSA signatures (rj,Sj) with 
biased nonces kj. We have the following relation due to step 5 of Algorithm [lj 

kj = H(rrij)sJ 1 + VjSj 1 x mod n for 0 < j < S — 1. 

Now let hj = H(rrij)sJ 1 * mod n and Cj = TjSj 1 mod n. Then the set {hj + 
c jX }j=o = {kj}j~Q will show a significant nonzero sampled bias. Moreover, for 
any w ^ x, the sampled bias from V w = {hj + Cjw} S -~^ will be relatively small. 
Since hj and Cj are publicly computable, we thus have a way to determine the 
correct value of x by testing all the value w G [0, n — 1]. 

To have a practical test, we have to broaden the peak of the bias such that 
values of w close to the correct value x will also show a significant bias. The 
peak will be broad if the Cj are relatively small. More precisely, by denoting 2^ 
a bound such that 0 < Cj <2^, then we can find an approximation of x by 
evaluating the sampled bias of 2^ evenly-spaced values of w between 0 and n — 1. 

The reduction of the Cj , second phase in Algorithm [2] can be done using a 
sort-and-difference algorithm. From S pairs (cj, hj), we first sort them according 
to their first element. Then we subtract each Cj from the next largest one and we 
take the differences of the corresponding hj as well. We thus obtain a list of S— 1 
pairs (c'-,hj) whose values c'- are on average log (S) bits smaller. More details 
about the analysis of this reduction are given later. This reduction algorithm can 
be repeated in order to achieve the bound 2^: once the MSB of x are known, one 
can rewrite the system and attack the next top bits, by integrating the learnt 
MSB into the Cj as was done in [21]. 

Now let Wm = mnj 2^, with m G [0, 2^ — 1], be 2^ evenly-spaced values between 
0 and n — 1. For sake of clarity, we keep the notation (cj, hj) for the reduced 
pairs with Cj < 2^ and we consider having L such pairs. Then 

1 L ~ 

Bn(V w J =- y 

3 = 

2 £ -l 

= £ 

t = 0 

with Z t = jj S{j| Cj *=t} e 27rihj ^ n . B n (V Wm ) can be viewed as the inverse Fast 
Fourier Transform of the vector Z = (Zo, • • • , Z 2 *_ i). Thus the multiple 
bias computations can be performed very efficiently using the FFT. From 
Step 17 to 20 in Algorithm [2j we compute this vector Z. Step 21 out- 
puts a vector of the sampled bias for the 2 £ candidates, i.e. iFFT(Z) = 
(B n (V WQ ),B n (V Wl ),--- ,B n (V W2£ _ i )). Finally, the value of w m = mnj 2^ with 
the largest sampled bias should share its t most significant bits with the secret 
key x. 


2—1 


^27ri(hj+Cjmn/2 £ )/n E 7 E g 2'Kihj/n ^ ^2'Kitm/2 £ 


*=° v {j\cj=t} 


£^^2nitm/2 £ 
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Choosing the Parameters. We first give some properties which will help to 
define the parameters for the attack. We can estimate the sampled bias for a 
wrong candidate w m , i.e. a value w m which do not share some most significant 
bits with the secret key x. More precisely, it can be shown that for w m either 
significantly larger or significantly smaller than x, we have B n (V Wrn ) ~ ^=, 
which corresponds to the average distance from the origin for a random walk on 
the complex plane. 

The second property concerns the Cj reduction phase and gives a relation 
between the number of signatures S and the number of reduced pairs L. 

Proposition 1 . Consider S ECDSA signatures of the form ( Cj , hj) and 7 E Z. 
The percentage of signatures (c' , h'-) after the first application of the sort-and- 
difference algorithm such that c r - < 2 1 °s < ?-iog<s , +7 can approximated by 1 — e -27 . 

Lemma 2. Let X\ be N independent uniformly distributed random 
variables over [0, 1], and for all i, denote by the i-th order statistic of the 
Xj ’ s (namely, X^ is the i-th smallest among the Xj ’s). Then, the random vari- 
ables Yi = X( i+ i) — for i = 1, ... ,7V — 1 are identically distributed, and 
all follow the beta distribution B(1,N), of probability density function (here- 
after pdf) f(t) = N • (1 — t ) N ~ 1 . As a result, for any constant a > 0, we have 
Pr[li < a/N] = 1 - e~ a + 0(1/N). 

Proof Indeed, a standard formula [TlJ 2.2.1] expresses the joint pdf of and 
X{i+ 1 ) as: 



for 0 < u < v < 1 
otherwise. 


Hence, the pdf fi of Yi is given by: 



for t e [0, 1]. 


The change of variable u = (1 — t)w gives: 
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where c = . In particular, we have fi(t)=c'(l — t) N 1 for some 

constant d and all t G [0, 1], and since f* fi = 1, we must have fi(t ) = N( 1 — 
t) N ~ 1 = f(t ) as required. As a result, we obtain: 


Pr 


Yi < 


a 

N 1 


r a / N i / a\ N 

1 - exp (N • (-a/N + 0(l/N 2 ))) = 1 - e~ a + 0(1/N). 


This concludes the proof. □ 

As an example consider a modulus n of size 160. Starting from 2 40 ECDSA 
signatures, after one iteration of the sort-and-difference algorithm, about 86.5% 
of them will have a value c'- < 2 121 . The percentage drops to 22.1% if we consider 
only those ones with a value c f - < 2 118 . Note that this proposition is only true 
for the first iteration of the algorithm where we really can consider variables as 
uniformly random and independently distributed. Clearly they are not after this: 
if after the first round variables were uniformly distributed, the ratio between 
7 = — 2 and 7 = 1 would be 0.125 = 1/2 3 where it is ~ 0.255. Sadly, it appears 
that the ratio progress in our disfavor when we want to iterate, i.e. the ratio after 
l iterations is less than (1 — e _2T )h We thus do not have a lower bound. However 
the ratio can be experimentally determined and Table |T] gives an overview for 
different values of 7 up to 6 iterations. 


Table 1 . Experimental ratio between the ECDSA signatures of the form (<7, hj) such 
that c'j < 2 logn-t '0°g‘ s, +^) ? anc [ the S initial signatures, after l iterations of the sort- 
and-difference algorithm 


7 

-2 

-1 

0 

1 

2 

1 st iteration 

0.22 

0.39 

0.63 

0.86 

0.98 

2 nd iteration 

0.031 

0.12 

0.36 

0.75 

0.94 

3 rd iteration 

3.2 10“ 3 

0.025 

0.17 

0.64 

0.89 

4 th iteration 

CO 

0 

0 

1 

4.6 10“ 3 

0.069 

0.53 

0.84 

5 th iteration 

2.0 10 -5 

6.7 1(T 4 

0.022 

0.40 

0.79 

6 th iteration 

2.8 10“ e 

9.5 10“ 5 

6.5 10“ 3 

0.28 

0.73 


Given S signatures, we have to choose a pair (7, l) such that log n — l • (log S + 
7) = £ is sufficiently small to perform a FFT in 2^ log i time and 2^ memory 
complexities. The algorithm complexity is 0(Slog(S) +^log(^)). Now a verifi- 
cation is necessary to be sure that this set of parameters will give a successful 
attack. Indeed denote by B n ( K) the initial bias which is fully determined by the 
number of most (or least) significant bits of the k{ which are known or set to 
zero (see Table [2] for some values). From properties (b) and (c) of the Lemma [lj 
each iteration of the sort-and-difference algorithm reduces the bias by raising 
it to the square of its norm (assuming that the variables are independant): in- 
deed, let X, Y be uniformly distributed and independent random variables on 
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[0, n — 1], then B n {X) = B n (Y) and B n (X - Y) = B n (X)B n (Y ) = \B n (X )\ 2 . 
The final bias is then approximated by |F> n (K)| 2 \ Thus the following inequality 
holds since B n (V Wrn ) « 1/VZ: 

\B n (K)\ 2L > i/yi, 

where L represents as before the number of reduced pairs (c^, hj) with <7 < 2C 
Using Table |T| which gives the ratio L/5 for different choices of pairs (7 , a), we 
obtain a relation between S', l, i and n. 

Note that contrary to previous reports in the literature im we do not need 
to center the kj around 0. Indeed sort-and-difference algorithm performs only 
subtractions and does not mix subtractions and additions as is common with 
lattice reduction or generalized birthday algorithms. 

Table 2. Some values of bias for large n, when b most (or least) significant bits of k 
are known, using Property (d) of Lemma [I] 


b 

1 

2 


4 

5 

B n ( K) 

0.6366198 

0.9003163 

0.9744954 

0.9935869 

0.9983944 


3.2 Implementation 

We successfully implemented the attack. As our target, we chose the SECG PI 60 
R1 curve, published in 2000 by the SECG consortium [8] and still considered 
secure. We fixed the most significant bit in the nonces and checked (with the 
help of the secret) that we indeed got the expected bias: « 0.63662. Our C++ 
implementation was based on the RELIC toolkit (using its provided plain C 
integer arithmetic) PQ and FFTW [T2] . We parallelized it in a straightforward 
manner (including (quick) sorting phases) and tested it on a multicore machine. 

We generated 2 33 signatures and performed 4 sort-and-difference reduction 
phases. 450 millions (which is 52.5%) of our initial 2 33 signatures had their 
Cj reduced down to 32 bits, as was expected from table [lj The bias after 4 
reduction steps was 0.000743558 which is slightly greater than the expected 
0.63662 24 ~ 0.00072792. We then computed a FFT on 32 bits (we selected the 
reduced Cj smaller than 2 32 ). The best candidate had a score approximately 35% 
greater than the second. Both corresponding MSB of the secret differed only by 
the 31 st and 32 nd most significant bits. The 3 rd and 4 th candidates were also very 
close to the two first ones, with score approximately 1/3 of the best candidate. 
Then, there was a number of random values with maximal score approximately 
1/6 of the best one. We repeated the experiment several (5) times and got similar 
results, always finding at least the 30 MSB of the secret with the best candidate. 
We couldn’t repeat it more because of the high computational resources involved. 

The total memory used by the signatures and FFT tables was slightly more 
than 1 terabyte. To recover 32 bits of the private key, the attack took approxi- 
mately 1150 CPU-hours, most of it being data exchange, which we can decom- 
pose as follows: 
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— 70%: parallelised quicksort (the most memory- intensive phase) 

— 18%: signature generation (approximately 250 to 430 kilocycles per signature 
depending on the CPU, excluding hash computations) 

— 10%: candidate selection and FFT table preparation 

— « 1%: the FFT itself. 

We did not use more parallelizable sorts like Batcher odd-even mergesort [2] but 
this would clearly be the next thing to do from a performance perspective. 

Next steps of the attack to recover the following bits of the secret were done as 
in |2I]. Basically, it amounts to a replay of algorithm [2] on the initial signatures, 
putting the previously found MSB of the secret into hj. Write the private key 
x as xo2 m + x\ where xo is the recovered m MSB at the first round. Then 
(hj) + (cj)x = (hj + CjX o2 m ) + (cjX\) and we want to recover the MSB of 
x\. We proceed as in the first round, except that we now keep the Cj that are 
smaller than 2 l+rn instead of 2 £ (thus when £ = m we just have to stop the 
reduction one iteration earlier). Then we build the FFT table as Z[cj/ 2 m ] = 
Z[cj/ 2 m ] + e 27Tlh o/ n . The FFT recovers the next most significant chunk of bits 
of the secret key. The computation restart makes it necessary to go back from 
the initial signatures, but there’s no need to keep them in memory during the 
reduction. In practice we had barely enough memory to keep them, but in order 
to reduce memory usage they should either be stored on disk and retrieved to 
iterate the secret recovery, or tracked down through the reduction and rebuilt 
afterwards. 

In practice, it is advisable to take a small security margin and reinject only 
30 bits of the computed MSB of the secret to account for small variations of 
sampling around the peak. In any case, if we recurse with a wrong secret, the FFT 
will not detect any peak. Experiments indeed showed no peak in this case, with 
the highest score not being statistically different from the other ones. This paves 
the way for a time/memory tradeoff: suppose the hardware is limited in memory 
and can only work on (say) 2 31 signatures and 2 30 FFT size instead of the 2 33 
needed for attacking 160 bits with 4 iterations with the previous algorithm. We 
first reduce the Cj from 160 to 40 bits with 4 reductions as usual. We then simply 
guess the 10 MSB of the secret and build 2 30 -sized FFT tables accordingly. The 
guess will be correct on the only one FFT among the 2 10 which shows a significant 
peak. Since FFTs are particularly efficient, much more than sorting, this is of 
practical importance. Alternatively, if it’s possible to compute 2 41 signatures, we 
can select only the expected 1/2 10 fraction of signatures whose corresponding Cj 
have their 10 MSB already zeroes, that is to say that have 150 bits instead of 160 
and can be reduced to 30 with 4 iterations. Finally, since the FFT table takes 
less memory than the signatures (a complex number occupies 16 bytes whereas 
a signature requires at least 40), we could improve the attack further by either 
carrying out several FFTs in parallel when guessing some bits of the secret, or 
by increasing the size of the FFT table slightly (with a corresponding increase of 
the selection bound on Cj). This would have two advantadges. Firstly, it would 
improve the sampling around the peak and reduce the uncertainty. Secondly, the 
bound increase implies that some signatures would be selected after the third 
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round of reduction instead of the fourth, thus having a much better bias and 
hopefully revealing more precise information about the secret. 

Our experiments targeted a 160-bit curve, but it should be pointed out that 
larger curves are susceptible to this attack as well. Roughly speaking, one can 
carry out the key recovery attack with 1-bit nonce bias on an TV-bit curve in 
time ~ 2 7V / 51og 2 W 5 ) and memory ~ 2 Ar / 5 . For example, a 256-bit curve can be 
attacked in time ~ 2 58 and memory « 2 52 : generate 2 52 signatures, perform 4 
reduction steps (removing 4 • 51 = 204 bits on approximately 53% of the data), 
keep signatures with Cj less than 2 52 and carry out the FFT on a table of size 
2 52 . One signature is 64 = 2 6 bytes, so that the total memory needed for the 
attack is 2 18 terabytes of storage, which corresponds to 65536 of today’s 4 TB 
disks. This does not appear to be out of reach of well- funded adversaries. 


4 Security Analysis of the Recomposition Technique 

The results presented so far had no direct connection with GLV/GLS curves. 
We now turn to such curves, and first discuss in this section the security of 
what we called the “recomposition technique” for GLV/GLS coefficients (namely, 
choose k\ and k<± uniformly at random in some interval [0 ,K) to obtain k = 
k\ + feA mod n), whereas the next section will focus on the “decomposition 
technique” . 

To fix ideas, we consider an elliptic curve E obtained by the quadratic GLS 
method over a prime field [13j §2.1]. In other words, there is an elliptic curve Eq 
over the prime field ¥ p such that E is the quadratic twist of Eq over ¥ p 2 . If we 
denote by p + 1 — t the order of Eq(¥ p ) (where t is bounded as \t\ < 2 by the 
Hasse-Weil theorem), the order n of E(¥ p 2 ) satisfies: 

n = (p - l) 2 + t 2 . (1) 

We assume that this order n is prime, which is the main case of interest. Then, 
E is endowed with an efficient endomorphism ^ (obtained by conjugating the 
Frobenius map with the twisting isomorphism) which acts on the cyclic group 
E(¥ p 2 ) by multiplication by 

A = t~ 1 (p — 1) (mod n). (2) 

In particular, A 2 = (p — 1 ) 2 /t 2 = —t 2 /t 2 = — 1 (mod n). 

In this setting, we first prove in 94. II that if k\ and &2 are chosen uniformly at 
random in [0, \Jn ), then k = k\ V/^A is statistically close to uniform in Z/nZ, so 
that such a choice of (£q, ^ 2 ) can be used securely in any cryptographic protocol 
(and in particular ECDSA). On the other hand, we show in 94.2l that if k\ and k<± 
are chosen in [0, 2 m ) where m = log 2 n\ instead, then k = k\ + ^A may not 
be close to uniform anymore, and we show that a variant of Bleichenbacher’s 
attack can apply. In 94.31 we describe an implementation of that attack on a 
160-bit GLS curve, similar to the attack of ^3) 
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4.1 A Secure Choice of (&i,& 2 ) 

Let E be a curve of prime order n over ¥ p 2 obtained by the quadratic GLS 
method as above. In view of 0, we have: 

(P - l) 2 < n < {p - l) 2 + (2 Vp) 2 = (p+ l) 2 , 

and the inequalities are in fact strict, since n is prime. Thus, we have p — 1 < 
Vn < p + 1, and it follows that the distribution of k = Aq + A^A for (fci , &2) 
uniform in [0, y^) 2 is statistically close to the distribution of the same k for 
(Aq, k 2 ) uniform in [0,p — l) 2 . We will thus concentrate on the latter, and show 
that it is close to uniform in Z/nZ using the following lemma. 

Lemma 3. The following map is injective. 

F: [0 ,p — l) 2 — » Z/nZ 

(Aq, Aq) i — > ki + AqA. 

Proof. Consider two distinct pairs (Aq,Aq) 7^ (^1^2) such that T\Aq,Aq) = 
F(k' 1 ,k' 2 ). We have: 

(x — x') + (y — ?/)A = 0 (mod n) 

( x — x') 2 = A 2 (t/ — ?/) 2 (mod n) 

(x — x') 2 + (2/ — y') 2 = 0 (mod n), 

since A 2 = — 1 (mod n). Thus, the positive integer (x — x') 2 + (y — y') 2 is divisible 
by n, and it is also smaller than 2(p — l) 2 < 2 n, so we must have (x — x') 2 + 
(y — ?/) 2 = I n °fher words, (x — x') 2 + (y — ?/) 2 is a decomposition of n as 
a sum of two squares. Now it is well-known that, as a prime number, n has at 
most one decomposition as a sum of two squares up to order and sign (see e.g. 
[20l §3.6]), and {p — l) 2 + t 2 is one such representation. As a result, we must 
have either x — x' = ±(p — 1) or y — y' = ±(p — 1), and neither is possible since 
those difference are bounded by p — 2 in absolute value. Hence, F is injective as 
required. □ 


Theorem 1. The distribution of the values k = Aq + AqA for (Aq, £2) uniform 
in [0 ,p — l) 2 is statistically close to the uniform distribution on ThjnTL. More 
precisely, the statistical distance: 


*= E 

k^L/nL 


Pr [k *= ki + Aq A ; (Aq, Aq) F- [0,p - l) 2 ] 


1 

n 


zs gzz;en by A\ — 2t 2 /n, which is negligible. 

Proof. Indeed, since the function F above is injective by Lemma [3j the proba- 
bility Pr [k = Aq + k 2 A ; (fci , k 2 ) y- [0,p — l) 2 ] is equal to l/(p — l) 2 for each 
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of the (p — l) 2 points in the image of F, and 0 for each of the n — (p — l) 2 = t 2 
points outside of that image. Therefore: 




(p- i) 2 


(p- i) 2 



n 



n 


(p- 1) 2 


e 

n 


2t 2 

n 


as required. This is bounded above by 8p/ (p— l) 2 , which is indeed negligible. □ 

Remark 1. Theorem [T| means that it is secure, in any ECC protocol instantiated 
over the GLS curve F, to sample random scalars k by picking k\ and k<± uniformly 
in [0 ,p — 1), or equivalently [0, yfn). 

As we can see, the proof relies on the particular arithmetic properties of the 
quadratic GLS method (mainly the fact that A = y/^1 in Z/nZ), so that the 
result does not readily extend to different settings, like the GLV method on a 
curve of CM discriminant —3. And indeed, in that case, Brumley and Nyberg 
have provided evidence that choosing (£q, ^ 2 ) uniformly in [0, y/n) may not yield 
a close to uniform distribution for [7J Example 3]. They suggest an alternate 
approach to select intervals to choose Aq and Aq from and still achieve high 
entropy in a more general setting, but since the quadratic GLS method is one 
of the most used variants of GLV/GLS, we believe Theorem [T| is of significant 
practical interest. 


4.2 Breaking Insecure Choices of with Bleichenbacher’s 

Attack 


In the quadratic GLS setting, we have just seen that choosing (Aq, Aq) uniformly 
in [0, y/n) 2 yields a close-to-uniform distribution of k = Aq + & 2 A. However, 
we can reasonably suspect that if we choose Aq and Aq uniformly in [0,2 m ), 
m = log 2 n\ (i.e. uniform bitstrings of length just under half of the size of n), 
the distribution of k will no longer be uniform. This is not immediately visible 
on the bias, however. 

Indeed, if we let T = 2 m and define as independent uniform random 

variables over [0,T) and K as the random variable in Z/nZ given by K = 
K\ + A2A, we have, by Lemma [lj 


Bn(K) = B n (Kx) ■ B n (\K 2 ) 


1 

sin(7r T/n) 

1 

sin(7rA T/n) 

T 

sin(7r /n) 

’ T 

sin(7rA/n) 


The first factor is very close to 1, but the second factor is usually negligi- 
ble. For example, on the 160-bit GLS curve (j3|) below, we have T = 2 79 and 
B n (\K 2 ) ~ 1.52/T. As a result, Bleichenbacher’s attack does not apply directly 
to this setting in general. 
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However, since A = t x (p — 1) (mod n), we claim that there is a significant 
bias on the values t • k. Indeed, we have: 


B n (tK) = BnitKi) - B n ((p - 1 )K 2 ) 


1 

sin(7r tT/n) 

1 

sin(7r(p — l)T/n) 

T 

sin(7rt/n) 

’ T 

sin(7r(p — 1 )/n) 


__ 1 ntT/n + Q((tT/n) 3 ) 1_ 

T irt/n + 0((t/n) 3 ) 

~ (l + 0((tT/n) 2 + (p/n) 2 


| sin(7r(p — l)T/n)\ 


T 7 r(p - 1 )/n + 0(((p - l)/n) 3 ) 
sin(7r(p — 1 )T/n) 


tt(p — l)T/n 


The big-0 in the first factor is negligible since tT/n = 0(p 1 / 2 -p/p 2 ) = 0(p ~ 1 / 2 ) 
and p/n = 0{p~ 1 ). On the other hand, (p — l)T/n ~ T / yfn is roughly between 
0.5 and 1 depending on how close n is to a power of two. Thus, the bias is 
significant in general, and is maximal when (p — 1 )T/n is smallest (close to 1/2), 
which happens when n is just under a power of two. The bias B n (tK ) is then 
close to 1/(7 r • 1/2) = 2/7 r « 0.637. 

It is then straightforward to adapt Bleichenbacher’s attack to this setting by 
targetting the values t-k instead of k. We can then break ECDSA signatures that 
use nonces of the form k = ki + k 2 \ above using that variant. An implementation 
of that attack is discussed in the next subsection. 


4.3 Implementation of Bleichenbacher’s Attack in the GLS Setting 

We carry out the attack described above on the 160-bit GLS curve E defined as 
follows. Over the 80-bit prime fielcQ F p , p = 255 • 2 72 + 1, we define Eq\ y 2 = 
x 3 — 3x/ 23 + 104. Then, the elliptic curve E is the quadratic twist of Eq over 
W p 2 = F p (v // 23), namely: 

E : y 2 = x 3 — 3x + 104 • V23 over ¥ p 2 . (3) 

The order of Eo(F p ) is p + 1 — t for t = 776009485427, and E(¥ p 2 ) is of prime 
order n = (p — l) 2 + t 2 . The theoretical value of the bias B n (tK ), computed 
using the exact formula above, is then ~ 0.634. 

We performed the recovery of 32 MSB of a private key as in section 13.21 We 
computed 2 33 signatures and unrolled the attack on ( tcj mod n, thj mod n) 
instead of (cj, hj). We checked the bias and obtained « 0.634116 which is close 
to the theory. In practice the attack took about 2000 CPU-hours, with 56% for 
the signature generation, 37% for the four sort-and-difference reduction steps, 
5% for the candidate selection and FFT table preparation and less than 0.5% 
for the FFT itself. In wall-clock time terms, except for the signature generation 
which took (much) longer, other phases were identical as 13.21 We attribute this 
unexpected increase in signing time to threshold effects: for example, represent- 
ing elements on a prime field with ~ 2 160 elements needs only 3 64-bit words, 
whereas a on F p 2 we needed 4x2 = 8 words. 


1 This is an example of “optimal prime field” (OPF). See e.g. [32] . 
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5 Security Analysis of the Decomposition Technique 

In this section, we analyze the security of algorithms for computing the decom- 
position of the nonces used in the GLV method from a side-channel analytic 
perspective. Many techniques have been proposed, including (15128] . The origi- 
nal GLV method m based on LLL reduction of a lattice that depends on the 
nonce A;, and variants thereof, have an execution time that depends on fc, and 
are therefore vulnerable to timing attacks. 

Therefore, we examine the security of a potentially more secure approach, the 
Park et al. [28] decomposition technique, using more involved power analysis 
technique. 

5.1 Decomposition Algorithm 

Park et al. provide an alternative decomposition to the GLV paper m which 
reduces the theoretical bound for the decomposition using the theory of /i- 
Euclidian algorithm and is a little bit faster. The algorithm requires two short 
and independent vectors v\ and v 2 of the two-dimensional lattice L = {(x,y) : 
x + y A = 0 mod n}. We can find these vectors during a precomputation time 
using the Gauss reduction. The algorithm consists in finding a vector in the lat- 
tice L = Ztq + that is close to (fc,0) using linear algebra. Then, (£q, k 2 ) is 
determined by the equation: 

(ki,k 2 ) = (fc,0) - + [b 2 ]v 2 ), 

where (fc, 0) = b\V\ + b 2 v 2 is an element of Q x Q. 


Algorithm 3. Decomposition technique of Park et al. in [28] 
Require: k ~ n, the shortest vectors v\ — (an, yi), v 2 = (x 2 ,y 2 ) 
Ensure: (An, fe) such that k = k\ + /c 2 A(modn) 

1 : D = xiy 2 - x 2 yi,ai — y 2 k , a 2 — -y\k 
2: Zi = [ai/D] for i — 1,2 

3 : k\ — k — (zixi + 22^2), k 2 — z\y\ + z 2 y 2 return (ki, k 2 ) 


The decomposition technique depicted in Algorithm [3] makes many computa- 
tions involving the sensitive nonce k. Particularly, the computation of a\ ( resp . 
a 2 ) is based on a multiplication of the nonce k by y 2 (resp. y\) which is assumed 
to be known since it is a precomputed value obtained from public parameters 
using a deterministic algorithm. 

Suppose now that we obtain the knowledge of the least significant byte of £ 
nonces Aq, • • • , A^. The best strategy for finding the secret key x consists in per- 
forming classical lattice attacks as proposed in [17123124] . For a 160-bit modulus, 
the lattice attack works consistently for £ > 27. However the side-channel attack 
may sometimes fail, i.e. the returned byte of some kj can be a wrong value. 
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Thus, by denoting 0 < c < 1 the confidence rate, the side-channel attack has to 
be performed on m > [27 /c] signatures. Then: 

— Select 27 signatures at random among them. 

— Perform the attack using these signatures. 

— If the attack fails, goto the first step. 

The probability of success at each iteration of the lattice attack is (™ 7 C ) / (^) • 
As an example, suppose we obtain m = 200 signatures, and can guess the least 
significant byte with 90% accuracy (c = 0.9). Then the probability of success of 
the lattice attack is about 4.7% and 21 lattice reductions have to be performed 
on average. Since LLL reductions are cheap, much lower success probabilities are 
tractable as well. 

In the following, we discuss the side-channel attack that aims at recovering 
the first byte of the nonce targeting the two aforementioned multiplications. 
We present the attack in the particular case of a 8-bit implementation (that 
corresponds to the device we used in experiments). Note that this attack may 
also work for 16-bit implementation but in this case the computational cost will 
be larger and the success rate smaller. 


5.2 Side-Channel Attack on this Implementation 

The details of the attack highly rely on the way the multiplication is imple- 
mented. Depending of the underlying algorithm, the attack may be more or less 
difficult. We present here the attack corresponding to the implementation we 
target but we will discuss adaptations to different algorithms. The multiplica- 
tion we target is a schoolbook multiplication with the nonce being scanned in 
the outter loop. Algorithm [4] outlines the implementation of such multiplication 
for £ n -bit nonces and ^ n / 2 -bit b. 


Algorithm 4. Multiplication v = kb of k = K2 Sl times b = Yfi= fo^ bi2 8z 

Require: £ n -bit k and £ n / 2 -bit b two integers, v = 0 

Ensure: v — k x b 

1: v <— 0 

2: for i = 0 to i < £ n /S do 
3: c 0 ^ 0 

4: for j — 0 to j < £ n / 2/8 do 

5: Vi+j = ( ki x bj + Cj) & OxFF 

6: Cj + 1 = (ki X bj + Cj) >8 

7: end for 

8: end for return v 


The idea is to take profit of all operations involving the first nonce- byte in the 
inner loop to recover its value. This can be done by propagating a probability 
distibution from an operation to another and updating it with the corresponding 
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leakages. Since the nonce bits have to be recovered using a single trace (the nonce 
is randomly generated for each signature) we place ourselves in the context of 
a profiled attack. The application of such an attack in a non-profiled setting is 
left as an open question. 

Template Attack on One Step. One step of the inner loop consists in a multipli- 
cation of the first byte of the nonce fco, a byte of the auxiliary input bi and the 
carry q. This results in a value V{ and a new carry i. We may obtain leakages 
for each of these variables. We denote by capital letters the output distributions 
of template exploitation corresponding to small letter variables. For instance, 
after processing the leakage corresponding to q, the attacker gets a distribution 

Ci = (Pr(cj = 0), Pr(c, = 1), . . . , Pr( Ci = 255)). 

Since these variables may be manipulated more than once during the compu- 
tation, different leakage points may be combined by multiplicating probabilities 
then normalizing the resulting distribution. More precisely, let Zi , Z 2 , • * • , be 
leakages corresponding to variable fco, then the distribution Kq obtained from 
these leakages is computed as 

1 

Pr(fc 0 = x ) = n Pr ( fc ° = X ID’ 

3 = 1 

where the normalizing coefficient Z is given by Y\ l j=i P r (^o = x\ lj). 

Propagating and Updating Distribution. Let us now discuss how to take profit 
of all the leakages of the inner loop to gain information on the byte ko. The 
main idea is to gather all the information from all variables of a given step i 
into distribution Kq and Ci + 1 then do the same at step i + 1 using the newly 
updated distributions. From a probabilistic point of view we should compute the 
joint distribution of variables of step i then compute marginalized distributions 
Kq and Ci+ \. The following algorithm updates the distributions Ko and Ci+i 
according to the distributions of variables bi, vi and q. 


Algorithm 5. Information propagation for one step of the multiplication inner 
loop 

Require: distributions Ko, Bi, Vi, Ci and C^+i 
Ensure: K' 0 and C' i+1 updated distribution 

1: Ao — (0, 0, . . . , 0) 

2: for 0 < k, b, c < 256 do 
3: 2 8 -u + v^r- kxb + c 

4: K'o(k) 4r- K(k) + K 0 (k) • B t (b) • Ci_i(c) • V(v) • Ci{u) 

5: C' +1 (u) 4- C' +1 (u) + Ko(k) • Bi(b) • Ci(c) • V(v) • C i+ i(u) 

6: end for 

7: return K' 0 / K 'o( k ) and C i+ 1 / C' + i(w) 




280 


D.F. Aranha et al. 


The attacker starts with using Algorithm [5] for the first step. Then she uses 
the newly updated distributions Ko and C\ and the initial distributions B \ , V± 
and C 2 as inputs of Algorithm [5] and so on ... At the end, the attacker gets the 
final distribution Kq from which she can derive the most likely value of the least 
significant bit (or more). 
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Abstract. In this paper, we introduce a new approach to side-channel 
key recovery, that combines the low time/memory complexity and noise 
tolerance of standard (divide and conquer) differential power analysis 
with the optimal data complexity of algebraic side-channel attacks. Our 
fundamental contribution for this purpose is to change the way of ex- 
pressing the problem, from the system of equations used in algebraic at- 
tacks to a code, essentially inspired by low density parity check codes. We 
then show that such codes can be efficiently decoded, taking advantage of 
the sparsity of the information corresponding to intermediate variables in 
actual leakage traces. The resulting soft analytical side-channel attacks 
work under the same profiling assumptions as template attacks, and di- 
rectly exploit the vectors of probabilities produced by these attacks. As 
a result, we bridge the gap between popular side-channel distinguishers 
based on simple statistical tests and previous approaches to analytical 
side-channel attacks that could only exploit hard information so far. 


1 Introduction 

The great majority of side-channel attacks published in the literature follow a 
divide and conquer strategy (DC). That is, they first attack independent parts 
of the key separately (divide), and then combine these pieces of information 
(conquer). Information on individual parts of the key is obtained by study- 
ing correlations between key-dependent leakage predictions and the actual side- 
channel measurements. The information can then be combined either by simply 
concatenating the most probable values of each key part together, or by using 
an enumeration algorithm [ 27123 ] . Examples of distinguishers exploiting such a 
strategy include Kocher et al.’s Differential Power Analysis (DPA) [13], Brier et 
al.’s Correlation Power Analysis (CPA) [2], Gierlichs et al.’s Mutual Information 
Analysis (MIA) |9], Chari et al.’s Template Attacks (TA) [4] and Schindler et 
al.’s Stochastic Approach (SA) [23]. The popularity of these tools is due to their 
simplicity and versatility: they can be adapted to essentially any implementa- 
tion, have low time complexity and work in a gray box manner. That is, they do 
not require a precise understanding of the underlying hardware, but their data 
complexity is highly dependent on the quality of the adversary’s leakage predic- 
tions. Therefore, the knowledge of implementation details and some engineering 
intuition can usually be exploited to improve their time and data complexity. 

P. Sarkar and T. Iwata (Eds.): ASIACRYPT 2014, PART I, LNCS 8873, pp. 282 f296l 2014. 

(c) International Association for Cryptologic Research 2014 
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In this context, one fundamental question regarding DC distinguishes is 
whether they are sufficient for security evaluations. That is, are the security 
levels estimated with such tools close enough to the worst-case? In view of the 
previously listed qualities (and in particular, their excellent time complexity), 
the most likely drawback candidate for DC strategies is a subopt imal data com- 
plexity. As a result, a number of research works have investigated whether the ap- 
plication of analytical strategies (i.e. targeting the full key at once) could provide 
improved results. To the best of our knowledge, one of the first attempts in this 
direction was Mangard’s Simple Power Analysis (SPA) against the AES key ex- 
pansion algorithm [15] • The Side-Channel Collision Attacks (SCCA) in [1124125] 
were next interesting steps, in which the key is recovered by solving a set of 
(mostly) linear equations corresponding to the first cipher round(s). Following, 
Algebraic Side-Channel Attacks (ASCA) were introduced in [ 21122 ] and probably 
constitute the most representative example of analytical strategy to date. Under 
certain conditions, they are able to extract the key of an AES implementation 
from a single leakage trace, in an unknown plaintext /ciphertext scenario. 

So to some extent, ASCA could be viewed as an extreme opposite to DC 
attacks, with a minimum data complexity coming at the cost of a (much) more 
complex and sensitive solving phase - hence raising questions regarding their 
practical relevance. For example in the first papers from Renauld et ah, the ad- 
versary represents the target block cipher and its leakages as an instance of sat- 
isfiability problem that she sends to a generic SAT solver (other types of solvers, 
e.g. based on Grobner bases, have also been analyzed E)- The main issue with 
this approach is a very weak resistance to noise, since the solver essentially needs 
to be fed with correct hard information. For this purpose, the usual strategy was 
to group certain leakage values according to a model with lower cardinality, e.g. 
the well-known Hamming weight one, in order to trade robustness for informa- 
tiveness. Improved heuristics are presented in fT7l25| . More recently, Oren et 
al. proposed to replace the use of a solver by that of an optimizer, leading to 
Tolerant ASCA (TASCA) able to exploit more general models [18119] , Yet, even 
these last attempts were quite inefficient in exploiting soft information, mainly 
because of the difficulty to translate a vector of probabilities (e.g. as provided 
by classical TA) into an optimizer-friendly format. In fact TASCA essentially 
encode these vectors as exhaustive hard information, hence limiting the num- 
ber of leaking operations that could be included in the optimizer to a couple of 
rounds (compared to the full cipher in ASCA), because of memory issues. Even- 
tually, the results in m provide yet another powerful approach to analytical 
side-channel attacks, based smart enumeration and specialized to the AES, but 
so far they also remain limited to the exploitation of hard information. 

This state-of-the-art seems to suggest that the probabilistic information pro- 
vided by side-channel leakages can be easily exploited with DC attacks, while 
analytical strategies require a preprocessing step to translate this soft informa- 
tion into hard one. In this paper, we argue that this intuition is flawed, and 
in fact relates to the way of formulating the problem rather than to its na- 
ture. That is, while previous analytical attacks were expressing the target block 
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ciphers and their leakages as equations, we propose to describe the same prob- 
lem as a code. As a result, and for the first time, we detail a Soft Analytical 
Side- Channel Attack (SASCA) that combines the best of two worlds, namely 
the noise robustness and low time complexity of DC strategies with the low data 
complexity of analytical ones. In this respect, our first contribution is to exhibit 
a natural way to encode a side-channel cryptanalysis problem. Next, we show 
that we can efficiently decode such problems thanks to the Belief Propagation 
algorithm (BP). Using these new tools, we are able (i) for low noise levels: to 
attack the AES FURIOUS implementation that was targeted in previous works on 
ASCA/TASCA with a single leakage trace, with significantly reduced time and 
memory complexities, (ii) for large noise levels: to attack the same implementa- 
tion with multiple plaintexts, but with 2 3 to 2 4 less traces than a standard TA. 
Summarizing, the proposed technique bridges the gap between DPA and ASCA. 

Related Works. While the motivation for SASCA quite directly derives from 
previous works in ASCA/TASCA, its mathematical modeling fundamentally dif- 
fers from them and is in fact much closer to some results exploiting techniques 
from coding theory. In particular, the application of Hidden Markov Models in 
the context of time-randomized implementations mm or side-channel disassem- 
blers [6], and the decoding of Low Density Parity Check (LDPC) codes in the 
context of SCCA [8] were sources of inspiration for the following work. 

Cautionary Note. In order to show the applicability of SASCA at different 
noise levels, our empirical results are based on simulated experiments. Yet, we 
insist that SASCA is (in general) just as realistic as any TA, since it relies on the 
same assumptions for the profiling phase (i.e. the knowledge of a single key - see 
Appendix Furthermore, we paid attention to exploit exhaustive templates 
(i.e. used 256 profiles per intermediate value attacked) which can be generalized 
to any leakage function and corresponds to the worst-case time complexity. 

2 Soft Analytical Side-Channel Attacks 

We first emphasize the differences between previous solver- or optimizer-based 
approaches to analytical side-channel attacks and our decoder-based solution. We 
then describe the BP algorithm and discuss its connection to the exploitation of 
side-channel leakage. We finally detail how to describe an AES implementation as 
a factor graph, that can be efficiently decoded by BP. The following descriptions 
assume a profiled attack scenario, as usual in worst-case evaluations [26] . 


2.1 Solving (or Optimizing) vs. Decoding 

In the course of a profiled side-channel attack, the adversary extracts information 
from leakage traces. This information comes from the processing of intermediate 
values throughout the cryptographic computations. By comparing these leakages 
with previously estimated templates, she obtains for each target value X{ a 
conditional posterior distribution Pr[X/|L]. Provided the device is not perfectly 
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side-channel resistant, most of the posterior distributions will have an entropy 
lower than that of a uniform distribution. In this context, the most interesting 
pieces of information relate to the encryption key. For this purpose, not only 
the leakages that directly correspond to key bytes - informally denoted as SPA 
leakages - are exploited, but also those of intermediate variables that depend 
on both the key and the (usually known) plaintext - informally denoted as 
DPA leakages - such as the SBOX outputs in the first AES round, typically. For 
example, starting from the posterior probability of the output value S out given 
the leakage L out , one can deduce its image before the substitution layer: 

P r [^m = Lout] = P ^[Sout = SBOX('y) |Z/ on f;] . 

For a known plaintext value P, one can compute a posterior distribution on a 
key byte K by unrolling the computation one step further: 

Pt[K = k\P = p,L out \ = Pr [S out = SBOx(k ® p)\L out \. 

These simple equations show that it is possible to derive information about 
the key using intermediate variables. Furthermore, one can easily combine the 
leakage obtained from multiple plaintexts, by marginalizing Pr (K = k) over the 
corresponding traces: this is in fact what DC attacks do. Next, and since multiple 
key-dependent variables can usually be found within cryptographic implemen- 
tations, a natural problem is to find ways to exploit them efficiently. But this is 
exactly where the DC strategy faces limitations. Namely, combining the leakage 
of these intermediate variables is trivial as long as they only depend on a single 
key byte, e.g. the SBOX inputs and outputs in a first block cipher round. One just 
deals with the additional variable as with an additional plaintext in this case. 
Taking the example of AES, this can even be extended to the first mixcolumns 
operation, if 32-bit key hypotheses are performed by the adversary. But the DC 
approach is inherently limited to the exploitation of predictable parts of the key. 
So as soon as the diffusion is complete (which very rapidly occurs in modern 
ciphers and therefore corresponds to most of their intermediate computations), 
the leakages are left unexploited by such strategies. This limitation directly leads 
to the main problem we tackle in this paper, namely: How to efficiently exploit 
the leakage of any intermediate variable in a side- channel attack? 

Previous ASCA were a first attempt to answer this question, by trying to solve 
a system of equations describing the target cryptographic algorithm, comple- 
mented with the information extracted. These attacks typically begin by sieving 
intermediate values, keeping only the most probable ones. A usual approach is to 
coalesce the leakages by Hamming weights for this purpose. The set of remain- 
ing values is then verified one against another (e.g. using heuristic SAT-solvers). 
Unfortunately, this algebraic approach cannot easily deal with the probability 
distributions output by TA, which are thus discretized and sieved. Whenever the 
measurement noise is not negligible, this introduces “errors” that are fatal to al- 
gebraic solvers. As mentioned in introduction, optimizers allow mitigating this 
problem, but are still limited in the exhaustive way they encode the probabilities 
(which is too expensive for describing more than a couple of AES rounds). 
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Our method works differently, by operating directly on the posterior distribu- 
tions of the intermediate values extracted from leakage traces, and propagating 
the information throughout the computation steps of the algorithm. When at- 
tacking a cryptographic implementation, we first build a large graphical model 
containing the intermediate variables, which are linked by constraints corre- 
sponding to the atomic operations executed. For instance, the exclusive-OR and 
SBOX functions are usually found in software implementations of the AES. Next, 
the goal is to find the marginal distribution of the key, given the distributions of 
all the intermediate variables. While this is generally a hard problem, we observe 
that an important feature of cryptographic algorithms is that intermediate val- 
ues tend to appear only in a few places. A similar behavior is present in Gallager 
codes [7;, also called Low-Density Parity Check codes (LDPC). In such a code, 
codeword bits are linked together by a small number of parity constraints (i.e. 
linear in the codeword size). Decoding such a construction is generally performed 
via application of the BP algorithm, also known as sum-product algorithm. Our 
application in the following sections is a (conceptually) simple extension, where 
values are not limited to bits, and parity constraints go beyond exclusive-ORs. 


2.2 The Belief-Propagation Algorithm 

Our description of the BP algorithm is largely based on the (excellent) de- 
scription provided in m chapter 26]. Let us consider a set of N variables 
x = {x n }n =1 , and define a function P* of x which is a product of M factors : 

M 

-P*( x ) = II /m(x m ), 

m= 1 

where each factor / m (x m ) is a function of a subset x m of x. The P* function is 
typically depicted using a factor graph , in which circles correspond to variables 
Xi and squares to functions f m . An edge is drawn between Xi and f m if Xi G x m , 
meaning that the ra-th factor depends on the i-th variable. For example, the 
parity functions and factor graph of a simple 3-repetition code are shown below: 


fl(Xl) 

h{X2) 

h{xz) 

f 4(xi,x 2 ) 


f 5 (x 2 ,X 3 ) 


= Pr(xi = 1) 

= Pr(x 2 = 1) 

= Pr(x 3 = 1) 

1 if x\ ® X 2 = 0 

0 otherwise 

1 if x 2 ® xs = 0 
0 otherwise 




The task we are interested in is that of marginalization. That is, we aim to be 
able to compute the following function: 


z n (x n ) = Yi 

X,X rl =X n 
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and more importantly it normalized version P n (x n ) = Z n {x n )/Z, where: 


M 

z = n 

x m= 1 

These tasks are intractable in general. Even when the factor functions are limited 
to three variables, the cost of computing the exact marginal is believed to grow 
exponentially with the number of variables N. The BP algorithm can circumvent 
this problem and compute marginals efficiently as long as the factor graph is tree- 
like. We will denote by A f(m) the set of variables involved in factor / m , by A4(n) 
the set of factors where variable x n appears, and shorthand the set of variables 
in x m with x n excluded as: x m \ n = {x n > : n' G N(m) \ n}. The algorithm 
works by passing two types of messages along the edges of the factor graph, 
from variables to factors (g n _^ m ) and from factors to variables (r m ^ n ). The sets 
of messages are updated using two rules: 


Qn^-m (%n) 


n r ” 

m'GAt(n)\m 


i (^n) • 


Cti— m(Tn) — ^ ^ I fm ( x m) | Qn' I * 

x m \n y n'eA/’(m)\n J 

Convergence should occur after a finite number of iterations, at most equal to the 
longest path. Once the network has converged, the marginal function (also called 
belief) of a variable x n can be recovered by multiplying together all incoming 
messages at the corresponding node: 

Zn(%n) = | m (-£n) • 

meA4(n) 

The normalized value P n (x n ) = Z n (x n )lZ is easily obtained by summing to- 
gether the marginal functions Z = ^2 Xn Z n (x n ). As already mentioned, the BP 
algorithm returns the exact marginals as long as the factor graph is a tree-shaped 
graphical model. Yet, in many useful cases such as decoding, the graph contains 
cycles. Fortunately, BP can be applied directly on general factor graphs as well, 
raising the so-called “loopy” BP. While this version does not guarantee to return 
the correct marginals, and may even not converge to a fixed point in some cases, 
it usually gives results that are good enough for most applications. 


2.3 Efficient Representation of an AES Implementation 

Our method for SASCA consists in an application of the BP algorithm to the 
decoding of keys using plaintexts, ciphertexts and side-channel traces. In this 
section, we illustrate it in the context of an implementation of the AES in an 8- 
bit device. For this purpose, the X{ variables defined in the description of the BP 
algorithm will represent the intermediate values handled by the cryptographic 
algorithm, and the parity functions will be separated into two sets: 
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— The first set corresponds to the a priori knowledge on the variables acquired 
through side-channel leakages, denoted as fi(xi) = Pr [xi = v\L\. 

— The second set corresponds to the operations executed by the implementa- 
tion. In the case of a binary operation Op(x^ , Xi 2 ), the function is defined by: 


h fail. 



1 if OP{x ix ,x i2 ) 
0 otherwise. 


^13 "> 


Based on these notations, an adversary first has to encode the AES computations 
in a form that is compatible with the BP algorithm. For illustration, and because 
it is publically available, we will describe how to build a factor graph for the AES 
FURIOUS implementation (http://point-at-infinity.org/avraes). 

Concretely, our program takes in a description which is very similar to the 
assembly code of AES FURIOUS, with the memory related operations left out, 
but where any assignment requires a newly named variable. Namely, variable 
nodes are denoted by names starting with a capital letter, such as K [2,4] _0 for 
the intermediate key in row 2 and column 4 of key scheduling round 0, which 
also happens to be the second master key byte (noted 4 in the factor graph), 
or SB [2, 1] _0 for the SBOX output in row 2 and column 1 of round 0 ( SB S> i in 
the factor graph). These variable nodes correspond to intermediate values com- 
puted during encryption, such as the state (ST), key addition or MIXCOLUMNS 
intermediate results (AK and MC), outputs of xtime operations (XT), ... Be- 
sides, factor node names start with an underscore such as _Xor (exclusive or) 
and _Xtime (polynomial multiplication by x). They correspond to instructions 
executed during encryption. For example, Table [T| gives samples of the corre- 
spondence between the assembly code, input description and factor graph. Note 
that the factor nodes for the prior probabilities of the variables are not drawn. 


Table 1 . Factor graph representation of an AES encryption 


Assembly code 

Graph description 

Factor graph 

Id HI, Y+ 
eor ST11, HI 
mov ZL, ST11 

* 

_Xor AK [1 , 1] _0 ST [1 , 1] _0 K[1,1]_0 
* 

UY^’ 1 A st 1,1 r\ AK i,ir~\ SB i,i 

1pm ST11 , Z 

_Sbox SB [1 , 1] _0 AK [1 , 1] _0 

1 XOR 1 1 SBOX 1 

mov H3 , ST11 
eor H3 , ST21 
mov ZL , H3 

* 

_Xor MC [3 , 1] _0 SB [1 , 1] _0 SB[2,1]_0 
* 


1pm H3 , Z 

_Xtime XT [1 , 1] _0 MC[3,1]_0 

f XOR | | XTIME | 

mov ZL, ST24 

* 


1pm H3 , Z 
eor ST11 , H3 

_Sbox SK [1 , 1] _1 K [2,4] _0 

_Xor XK [1 , 1] _1 SK [1 , 1] _1 K [1 , 1] _0 

Q^2° 

eor ST11 , HI 

_XorCst K [1 , 1] _1 XK [1 , 1] _1 Oxl 

| SBOX | f XOR j | XORCST | 


There are two notable differences between SASCA and the classical decoding 
of LDPC codes. First, variable nodes are not binary digits, but rather elements of 
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GF(2 8 ). Second, factor nodes are not limited to exclusive Oil’s, but may include 
any of the variety of functions used in cryptographic implementations (e.g. XOR, 
SBOX, xtime). However, these factor nodes are not much more complex than for 
classical decoding, as illustrated with our three previous examples: 


XOR (A,B,C) 


1 if A 0 B = C, 
0 otherwise. 


sbox(A, B ) 


1 if A = S{B), 
0 otherwise. 


xtime(A, B) 


1 if A = Xt(B), 
0 otherwise. 


This natural representation of operations is very efficient, as opposed to the 
contrived way AES encryptions are translated to SAT instances (roughly, it corre- 
sponds to 1,200 equations and variables in GF(2 8 ) compared to 18,000 equations 
in 10,000 variables in SAT-based ASCA). Taking advantage of it, the SASCA ad- 
versary then tries to compute the key marginal probability for P n (K ) given the 
leakages. For this purpose, one simply has to incorporate the implicit factor 
nodes corresponding to prior knowledge on variable nodes, as given by the tem- 
plates of the side-channel attack. For instance, the factor for the output of the 
first SBOX in the first round f m (SB f ± ) is the posterior distribution Pr[SB J 1 \L\. 
In addition, any known value (for instance the plaintext bytes) has a prior knowl- 
edge with entropy zero, and any value that does not leak (either because it is 
protected or precomputed) has a uniform prior. Eventually, the loopy BP algo- 
rithm propagates information throughout the factor graph: if successful (i.e. in 
case of convergence), it should return the approximate marginal probabilities of 
the key bytes P n (K ° x ) to P n {K^ 4 ), i.e. the answer we are looking for. 


2.4 Attacking with Several Traces 

The ability to efficiently exploit (i.e. combine the information of) several leakage 
traces is one of the reasons that have made DPA attacks so popular - since it 
typically leads to the noise vs. data complexity tradeoff that is at the core of most 
side-channel attacks. It also remains one of the main practical issue for ASCA 
and follow-up works. So far, the only way several traces can be useful is when 
they are repetitions of the same encryption (without randomizations), so that the 
noise can be averaged out. By contrast, adding traces corresponding to multiple 
plaintexts could only be managed with the construction of larger systems, that 
are too memory consuming for TASCA, and increasing the probability that one 
piece of hard information in such systems is incorrect for ASCA. 

Interestingly, SASCA are able to improve the key recovery success rate with 
each additional trace observed. Practically, the factor graph used for decoding 
is first replicated for each trace. Yet, since the master key stays the same during 
the course of the attack, the part of our factor graph corresponding the key 
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Fig. 1. Factor graph connections for several traces 


scheduling also remains constant: it forms a kind of “backbone” where all the 
encryption rounds connect, as depicted in Figure [lj As a result, whenever sev- 
eral messages are used, the probability distributions are propagated from each 
replicated graph towards the key schedule. The impact of such propagation is in 
fact very similar to the one resulting from using several traces in a classical TA, 
where probabilities are multiplied together and the success rate increases. 

3 Experimental Results 

We now validate the method described in the previous section with illustrative 
simulated attacks against the AES FURIOUS implementation. For this purpose, 
we assume a setup that is essentially similar to the one used to demonstrate 
the applicability of ASCA to the AES in [22]. The only difference is that we 
will consider implementations with and without the key scheduling leakages. As 
previously explained (and illustrated in Table [I]), all the operations found in 
the assembly code are translated into factor nodes, excluding memory related 
operations. For illustration, we considered Hamming weight leakages affected by 
a noise of variance cr^, but the attack is independent of this choice: any function 
could be incorporated without performance penalty. The only important param- 
eter in our case is the informativeness of the leakages which, in the first-order 
setting we investigate, can be measured with a Signal-to-Noise Ratio (SNR) [T6] . 
Since the signal (i.e. variance) of a Hamming weight leakage function for 8-bit 
intermediate values equals 2, one can simply derive the SNR as 2/cr^. For il- 
lustration, we compared our results with the ones of two standard TA. Namely, 
one univariate exploiting only the first-round S-box output leakages, and one 
bivariate exploiting the first-round S-box input and output leakages. 

The results of our experiments are shown in figure [2] The x-axis corresponds 
to the number of messages used for the attack (in log scale), and the y-axis is 
a stack of success rate curves for decreasing SNRs (i.e. increasing noise levels). 
An alternative view is provided in Figure [3] which sums up these simulation 
results by showing the data complexity gains of SASCA over TA. It appears 
from both figures that these gains are significant and consistently observed for 
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Fig. 2. Attacks results for our simulated FURIOUS implementation. Each graph gives 
the success rate (SR, ranging from 0 to 1) for a given signal-to-noise ratio (SNR, 
ranging from 2 4 down to 2 -6 ) as a function of the number of traces (in logarithmic 
scale, ranging from 1 to 5000). The attacks are: 

- univariate TA targetting the SBOX output (in dark gray □), 

- bivariate TA targetting the SBOX input and output (in blue ^1), 

- SASCA attack ignoring the key schedule leakages (in violet ^1), 

- SASCA attack exploiting all the intermediate values (in orange ^1). 

any noise level. Eventually, the unknown inputs and outputs scenario is detailed 
for SASCA in Figured! We see that its impact is limited if the key scheduling 
leaks (confirming the results from [15]) and more significant otherwise. 
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SNR 


Fig. 3. Data complexity gain of SASCA compared to TA given as the fraction of 
measurements needed to reach a success rate of 0.9 (same colors as in Figure [2j 


Discussion. Compared to previous results in ASCA/TASCA, our new tools 
bring two main advantages. First, from the SNR point of view, these works were 
typically limited to scenarios where a single leakage trace was enough to recover 
the master key (i.e. to SNRs > 2 2 ). We can deal with any SNR. Second, the 
time and memory complexity of the BP decoding is much improved compared 
to SAT-solver based ASCA and optimizer-based TASCA. Our implementation 
deals with a factor graph of size proportional to the number of messages, with a 
relatively high (yet easily tractable in practice) constant of approximately 16M 
per message. Its computation time is proportional to both the diameter of the 
graph (constant after the second message) which sets the number of decoding 
iterations, and the number of measurements which sets the amount of messages 
exchanged at each iteration. This makes the evolution of the time and memory 
complexity of SASCA quite comparable to the one in divide and conquer TA 
(i.e. linear in the number of messages). Yet, decoding the AES encryption factor 
graph with the BP algorithm implies a larger computation time of approximately 
one second per message in our prototype implementation, running on an Intel i7- 
2720QM. This (constant) overhead is the main penalty to enjoy the substantially 
smaller data complexities of SASCA (i.e. similar to ASCA/TASCA) which is, as 
expected, the main advantage of analytical strategies over DC ones. 

As detailed in Appendix m the practical relevance of such attacks is quite 
similar to TA, since it requires the same profiling assumptions (i.e. the knowledge 
of a single key). Admittedly, the profiling effort is significantly more expensive 
for SASCA, since it requires characterizing all the target intermediate values. 
But since all these target values can be profiled independently, building their 
templates can be done quite efficiently (with essentially the same amount of 
measurements as needed to characterize the first-round operations exploited in 
TA), and is easily automated with standard side-channel attack techniques. 
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Fig. 4. SASCA with unknown input and output for different SNRs.The x-axis is the 
number of traces used for the attack (in log scale), and the y-axis gives the probability 
of key recovery. The top graph corresponds to a leaky key schedule, and the lower 
graph gives the results for a leak-free key schedule. 


4 Conclusions 

By modeling the side-channel analysis problem adequately, SASCA bring the 
missing link between standard DC distinguishers and analytical strategies for 
key recoveries. As a result and for the first time, we are able to efficiently exploit 
the probabilistic information of all the leaking operations in a software imple- 
mentation. Our resulting attacks are optimal in data complexity and efficient in 
time and memory. Yet, we note that the tools exploited in this first instantiation 
of SASCA can certainly be improved. For example, the BP algorithm performs 
too many computations for our needs. Indeed, it propagates every distribution 
throughout the factor graph whereas in practice, we are mostly interested in 
the key. Hence, further works could exploit the propagation of messages only 
towards the schedule (i.e. perform Bayesian inference). This would additionally 
allow the attack to be performed one message at a time, by accumulating in- 
formation retrieved from each trace onto the nodes of the key schedule, hence 
reducing the memory requirements to that of a single trace (i.e. 16M). 

In view of the improved noise robustness of SASCA, an important open prob- 
lem is to determine whether the strong results obtained with this new type of 
analytical strategy also apply to implementations protected with countermea- 
sures. Masking, shuffling and leakage-resilient cryptography appear as the most 
interesting targets in this respect. Besides, the experiments in this work consid- 
ered a worst-case scenario where the adversary could take advantage of all the 
leaking operations of an AES implementation (i.e. assuming the knowledge of 
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the source code, essentially). But the investigation of an intermediate scenario 
where the adversary would exploit less leaking observations (e.g. the ones he 
could guess without knowing the source code) and its resulting time and data 
complexity is another interesting scope for additional investigations. 
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A Attack Requirements 

In this section, we provide a brief discussion of the profiling step that precedes 
the application of SASCA. In particular, we argue that the profiling overhead 
and required knowledge for this purpose are similar to those of standard TA. 

Profiling Overhead. Similarly to classical TA, SASCA require profiling the leak- 
age corresponding to their target intermediate values. In this respect, the only 
difference is that they can take advantage of many such values, whereas DC 
strategies only exploit the first round (s) leakages. In general, one can assume 
that all target intermediate values leak a similar amount of information. And if 
it is not the case, it is usually the first round(s) leakages that have lower SNRs. 
As a result, and given that the set of profiling traces corresponds to random 
inputs, one can essentially build all the SASCA templates with the same traces 
as for a TA, by simply re-organizing these traces according to the target inter- 
mediate values. This process can be automated based on the implementation 
knowledge, and its computational cost grows linearly with the number of tar- 
gets. Concretely, this cost should be small for most concrete implementations, 
and if needed can be speeded up by assuming sets of intermediate values to leak 
according to the same model (possibly at the cost of some information loss). 

Required Knowledge. Since templates are built by grouping the leakage traces 
according to some target intermediate values, it requires being able to predict 
these values. Both for TA and SASCA, this is usually achieved thanks to some 
key knowledge (or a profiling device). So both attacks can be based on the same 
assumptions. In fact, their main difference is that any intimate knowledge of the 
target implementations can - but does not have to - be exploited by SASCA 
(while, e.g. the middle round leakages are useless for DC attacks). The experi- 
ments in this paper consider a worst-case scenario where the adversary knows 
the implementation source code. Another extreme scenario would be to consider 
only “standard” attack points that can be guessed from the algorithms specifi- 
cations (e.g. S-boxes inputs/outputs), which would reduce the gain of SASCA 
compared to TA. Any intermediate situation could be investigated, correspond- 
ing to various tradeoffs between implementation details and attack efficiency. 
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Abstract. The Double-Base Number System (DBNS) uses two bases, 2 
and 3, in order to represent any integer n. A Double-Base Chain (DBC) 
is a special case of a DBNS expansion. DBCs have been introduced 
to speed up the scalar multiplication [n]P on certain families of ellip- 
tic curves used in cryptography. In this context, our contributions are 
twofold. First, given integers n, a, and b , we outline a recursive algo- 
rithm to compute the number of different DBCs with a leading factor 
dividing 2 a 3 b and representing n. A simple modification of the algorithm 
allows to determine the number of DBCs with a specified length as well 
as the actual expansions. In turn, this gives rise to a method to compute 
an optimal DBC representing n, i.e. an expansion with minimal length. 
Our implementation is able to return an optimal expansion for most 
integers up to 2 60 bits in a few minutes. Second, we introduce an origi- 
nal and potentially more efficient approach to compute a random scalar 
multiplication [n]P, based on the concept of controlled DBC. Instead of 
generating a random integer n and then trying to find an optimal, or 
at least a short DBC to represent it, we propose to directly generate n 
as a random DBC with a chosen leading factor 2 a 3 6 and length t. To 
inform the selection of those parameters, in particular £, which drives 
the trade-off between the efficiency and the security of the underlying 
cryptosystem, we enumerate the total number of DBCs having a given 
leading factor 2 a 3 6 and a certain length £. The comparison between this 
total number of DBCs and the total number of integers that we wish 
to represent a priori provides some guidance regarding the selection of 
suitable parameters. Experiments indicate that our new Near Optimal 
Controlled DBC approach provides a speedup of at least 10% with re- 
spect to the NAF for sizes from 192 to 512 bits. Computations involve 
elliptic curves defined over F p , using the Inverted Edwards coordinate 
system and state of the art scalar multiplication techniques. 

Keywords: Double-base number system, elliptic curve cryptography. 


1 Introduction 

1.1 Elliptic Curve Cryptography 

An elliptic curve E defined over a field K is a nonsingular projective plane cubic 
together with a point with coordinates in K. For cryptographic applications, the 
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field K is always finite. In practice, it is a large prime field ¥ p or a binary field 
W 2 d. We refer to |23| for a mathematical presentation of elliptic curves and to 
[UCES] for a discussion focused on cryptographic applications. 

There are different ways to represent the curve E : in particular with a Weier- 
straB equation or in Edwards form [1313]. Irrespective of the representation, the 
set of points lying on the curve E can be endowed with an abelian group struc- 
ture. This property has been exploited for about twenty five years to implement 
public-key cryptographic primitives. 

The core operation in elliptic curve cryptography is the scalar multiplication , 
which consists in computing [n]P given a point P on the curve E and some 
integer n. Several methods exist relying on different representations of n. One 
of the simplest approach relies on the non-adjaccnt form (NAF) [20i|T9], which 
allows to compute [n\P with t doublings and t/ 3 additions on average, where t 
is the binary length of n. The approach discussed next is more sophisticated and 
has recently received increasing attention. 


1.2 Double-Base Number System 


The Double- Base Number System (DBNS) was introduced by Dimitrov and 
Cooklev [5] and later used in the context of elliptic curve cryptography [6]. 
With this system, an integer n is represented as 


i 

n = ^^Ci2 ai 3 bi , with q G { — 1,1}. (1) 

i= 1 

This representation is highly redundant and an expansion can easily be found 
with a greedy-type approach. The principle is to find at each step the best 
approximation of a given integer in terms of a {2,3 ^-integer, i.e. an integer of 
the form 2 a 3 6 . Then compute the difference and reapply the process until we 
reach zero. 

Example 1. Applying this approach to n = 542788, we find that 
542788 = 2 8 3 7 - 2 3 3 7 + 2 4 3 3 - 2.3 2 - 2. 

In m, Dimitrov et al. show that for any integer n, this greedy approach returns 
a DBNS expansion of n having at most O ( lo ^g — ) terms. However, in general 
this system is not well suited for scalar multiplications. For instance, in order to 
compute [542788] P from the DBNS expansion given in Example [lj it seems that 
we need more than 8 doublings and 7 triplings unless we can use extra storage 
to keep certain intermediate results. But, if we are lucky enough that the terms 
in the expansion can be ordered in such a way that their powers of 2 and 3 are 
both decreasing, then it becomes trivial to obtain [n]P. 


On the Enumeration of Double-Base Chains with Applications 


299 


1.3 Double-Base Chain 

The concept of Double-Base Chain (DBC), introduced in [6], corresponds to an 
expansion of the form 

i 

^^Ci2 ai 3 b \ with C{ G { — 1,1} (2) 

i=l 

such that a\ > <22 > • • • ^ an and b\ ^ 62 ^ ^ bn. (3) 

Equivalently, §3§ means that 2 a£ 3 bi | ••• | 2 a2 3 b2 \ 2 ai 3 bl . It guarantees that 
exactly an doublings, bn triplings, I — 1 additions, and at most two variables are 
sufficient to compute [n\P. It is straightforward to adapt the greedy algorithm 
to return a DBC. 

Example 2. A modified greedy algorithm returns the following DBC 

542788 = 2 14 3 3 + 2 12 3 3 - 2 10 3 2 - 2 10 + 2 6 + 2 2 . 

The DBC expansion returned by the greedy approach is always at least as long 
than its DBNS counterpart. Furthermore, it has been shown in m that for any 
size £, there exists a t-bit integer n such that any DBC representing n needs 
at least Q(t) terms. But the DBC has the advantage to offer a much more 
direct and easy way to compute a scalar multiplication. The most natural ap- 
proach is probably to proceed from right-to-left. With this method, each term 
2 ai 3 bi is computed individually and all the terms are added together. This can 
be implemented using two variables. The left-to-right method, which can be 
seen as a Horner-like scheme, needs only one variable. Simply initialize it with 
[ci2 ai_a2 3 6l_&2 ]P, then add c^P and apply [2 a2_a3 3 62_6s ] to the result. Repeat- 
ing this process eventually gives [n]P, as illustrated with the chain of Example [2] 

[542788]P = [2 2 ] ([2 4 ] ([2 4 ] ([3 2 ] ([2 2 3]([2 2 ]P + P) — P) - P) + P) + P). 

Note that there are other methods to compute a DBC, see for instance a 
tree-based algorithm developed in [9] . There exist also several variants and gen- 
eralizations of the DBC. For instance, the extended DBC m relies on nontrivial 
coefficients and precomputed points in order to obtain shorter chains. There is 
also a notion of joint DBC El El for double scalar multiplications of the form 
[n\P + [m\Q. Next we are interested to find the best possible chains for a given 
integer. To this end, we introduce the following. 

Definition 1. We call the largest {2,3} -integer of a DBC chain in absolute 
value, i.e. 2 ai 3 bl in m, the leading factor of the chain. It encapsulates the total 
number of doublings and of triplings necessary to compute [n\P . 

Among all the different DBCs with a leading factor dividing 2 a 3 6 and represent- 
ing n, the DBCs with minimal length play a special role as they minimize the 
number of additions required to compute [n\P. This observation gives rise to the 
following definition. 
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Definition 2. Given integers a, b, and n, a DBG with a leading factor dividing 
2 a 3 b and representing n is said to be optimal for n, if its length £ is minimal 
across all the DBCs with leading factor dividing 2 a 3 b and representing n. 

Remark 1. For the purpose of this study , we slightly modify the definitions of 
a Double-Base expansion and of a DBC so that we can precisely and meaning- 
fully enumerate them. Concretely, we require that each term 2 ai 3 bi appears at 
most once in any expansion or chain. In practice, expansions always fulfill this 
property. Also, this requirement is not a real constraint since 2 ai 3 bi + 2 ai 3 bi = 
2 ai+1 3 bi . From now on, when we use the terms double-base expansion or DBC, 
this restriction is implied. 

Definition 3. An unsigned Double-Base Chain is a DBC of the form HP such 
that all the coefficients Ci ; s are equal to 1 and satisfying w- 

Some properties of the set containing all the unsigned DBCs of a given integer n, 
in particular its structure and cardinality, are studied in m- Next, we investigate 
the number of signed DBCs representing a given integer. 

2 Enumerating DBCs Representing a Given Integer 

2.1 Partition Problem 

Given an integer n, the number p(n) of partitions of n of the form 

n = dk H b d 2 Hb d\ with d\ \ d 2 | • • • | dk 

is studied by Erdos and Loxton in |14j . The authors also introduce p\ (n) as the 
number of partitions of n of the form n = dk + • • • + Hb 1 with d\ \ d 2 | • • • | dk . 
They observe that p(n) = pi(n) + pi(n + 1) and that 

pi( n ) = E pi 

d\n—l,d>l 



2.2 Enumerating DBCs 

Mimicking their approach, we introduce q{a,b,n), the number of signed parti- 
tions of n of the form 

n = dk =b dk- 1 ± • • • ± ^2 zb d\ with d\ \ | • • • | dk \ 2 a 3 b . 

Clearly, q{a, b , n) corresponds to the number of DBCs with a leading factor 
dividing 2 a 3 6 and representing n. Note that in the signed version, it is necessary 
to take into account a and 6, the largest powers of 2 and 3. Indeed, we observe 
that 1 = 2 k — E-=o f° r any k > 0. This shows that the number of signed 
representations of any integer is infinite. Obviously, the problem disappears when 
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we bound the leading factor of an expansion by 2 a 3 b . Similarly, we introduce 
gi(a, 6, n) as the number of partitions of n of the form 

n = dk i dk— i i • • • i + 1 with d 2 | • • • | dk \ 2 a 3 b 

and qi(a, 6, n) as the number of partitions of n of the form 

n = dk =b dk - 1 =b • • • =b — 1 with d 2 | • • • | dk \ 2 a 3 b . 

In the following, we denote the valuation of u at 2 and 3 by val 2 (+) and vals(u), 

respectively. 

Proposition 1. We have 

1. q(a , b,n) = <?i(a, b,n) + q\(a, b,n) + q\(a, b,n — 1). 

2. 

q 1 (a,b,n)= ^ q± 

d|gcd(n-l,2 a 3 b ) 
d> 1 

+ X] 

d|gcd(n-l,2 a 3 b ) 
d>l 


, — val 2 (d), b — val 3 (d) 


71 — 1 


a — val2(d), b — vals(d), 


77—1 




g , i (o,6,n)= ^ 4i 

d|gcd(n+l,2 a 3 b ) 
d>l 

+ ^2 Qi 

d|gcd(n+l,2 a 3 b ) 
d>l 

gi(a, 6, 1 ) = 1 , if a ^ 0 and b ^ 0 , and qi(a, 6, 1 ) = 0 otherwise. 

5. q\{a, 6, 1) = a, if a ^ 0 and b ^ 0, and q\(a, 6, 1) = 0 otherwise. 

Proof. 

1. We observe that any DBC representing n must end by 1, —1, or a term that 
is a nontrivial divisor of the leading factor. These three sets form a partition 
of all the DBCs representing 77 . By definition, the cardinality of the first two 
sets is </i(a, 6, 77 ) and gj(a, 6, 77 ). There exists a bijection between this last set 
and the set of DBCs representing n — 1 ending with —1. Note that we could 
also compute g(a, 6, 77 ) as gi(a, 6, 77 ) + gj(a, 6, 77 ) + gi(a, 6, 77 + 1). 

2 . Let us consider a DBC with a leading factor dividing 2 a 3 6 , ending with 1, and 
representing n. Then this DBC can be written JT Ci2 ai 3 bi ±d-\-l where d > 1 
and d \ 2 ai 3 bi for all i. If we denote a = val 2 (d) and /? = vals(d), we see that 
the chain JT Ci2 ai ~ a 3 bi ~ @ d= 1 represents (77 — l)/d. We note that its leading 


a — val2 (d), b — vals(d), 


77+1 


a — val2(d), b — vals(d), 


77+1 
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factor must divide 2 a-a 3 6-/3 and it ends by 1 or —1. Also, by construction, 
the factor d is a divisor of n — 1 and of 2 a 3 6 . Reciprocally, take d = 2^3^ a 
common divisor of n — 1 and 2 a 3 6 . Then for any DBC with a leading factor 
dividing 2 a ~ a 3 b ~ & and representing (n — 1 )/d, it corresponds a unique DBC 
with a leading factor dividing 2 a 3 6 , finishing with 1 and representing n. 

3. The proof is similar to 2., except that we need to consider DBCs of the form 
£ iCi 2“*3 6 ‘±d-l. 

4. We assume that each term 2 ai 3 bi appears at most once, cf Remark [l] With 
this constraint in mind, it is easy to check that there is a unique DBC ending 
with 1 and representing 1, namely the chain 1. 

5. Regarding the DBCs representing 1 and ending with —1, we note that for 

any k > 0, we have 2 k — 2 2 = 1. In particular, the previous formula 

for k = 1 up to a gives rise to a total number of a different DBCs with a 
leading factor dividing 2 a 3 6 , ending with —1, and representing 1 . It is easy 
to see that there is no other solution. This shows that q\(a, 6, 1) = a, when 
a ^ 0 and b ^ 0. □ 

Using Proposition [lj it is possible to compute g(a, 6, n ) recursively, for any tuple 
(a, b, n). 

Example 3. We have q( 14,5,542788) = 2092690. In other words , there are 
2092690 different DBCs with a leading factor dividing 2 14 3 10 and representing 
542788. 

Remark 2. The approach is highly recursive but precomputing small values 
can greatly speed up computations. For instance, precomputing q\(a,b,n) and 
q\(cL, b , n) for all (a, 6, n) G [0, 30] x [0, 20] x [1, 1000] allows to deal with numbers 
of size up to 30 bits in a few seconds. 


2.3 Enumerating DBCs of Bounded Length 

A simple modification of the algorithm outlined above allows to determine the 
total number of different DBCs of length less or equal to t with a leading factor 
dividing 2 a 3 6 and representing an integer n. Namely, we introduce a new pa- 
rameter I to keep track of the length of the DBC. It is straightforward to check 
that 

q(a, b,£,n) = qi(a, b,£,n) + q\(a, b,£,n) + q\(a, b,I+\,n- 1 ). 

Additionally, q\[a,b,I,n) and q\(a,b, I, n) satisfy relations similar to the ones 
expressed in Proposition [lj For instance, 

q 1 (a,b,£,n)= ^ qi 

d|gcd(n-l,2 a 3 b ) 
d> 1 

+ T qi 

d|gcd(n-l,2 a 3 b ) 
d> 1 


a — vaU (d) , 6 — val 3 (d) , I — 1 , ■ 


a — va U (d) , b — val 3 (d) , I — 1 


n — 1 
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Finally, it is easy to see that qi(a,b,£, 1) = min(l, max(0, £)) and qj(a,b, £, 1) = 
min(a, max(0, £ — 1)). This gives rise to Algorithms 1 and 2. 

Algorithm 1. q±(a, b,£, n) 

Input: An integer n and parameters a, b , and £. 

Output: The number of DBCs representing n, ending with 1, having a 
leading factor dividing 2 a 3 6 , and a length less than or equal to £. 

1. if n ^ 0 or a < 0 or b < 0 or £ ^ 0 then return 0 

2. else if n = 1 then 

3. if a ^ 0 and b ^ 0 then return min(l, max(0, £)) 

4. else return 0 

5. else if n > 1 then 

6. D -5— gcd(n — 1, 2 a 3 6 ) 

7. s -f- 0 

8. for each divisor d > 1 of D do 

9. s <— s T qi (a — val 2 (d), b — vals(d), £ — 1, n ~^~) 

10. s <— s + q\ (a — vaD (d),b — vals(d), £ — 1, 

11. return s 


Algorithm 2. gj(a, b, £, n) 

Input: An integer n and parameters a, b , and A 

Output: The number of DBCs representing n, ending with —1, having 
a leading factor dividing 2 a 3 6 , and a length less than or equal to £. 

1. if n ^ 0 or a < 0 or b < 0 or £ ^ 0 then return 0 

2. else if n = 1 then 

3. if a ^ 0 and b ^ 0 then return min(a, max(0 , £ — 1)) 

4. else return 0 

5. else if n > 1 then 

6. D <— gcd(n+ l,2 a 3 6 ) 

7. s i — 0 

8. for each divisor d > 1 of D do 

9. s s + qi (a — vaD(d, 2), b — val 3 (d, 3), £ — 1, 

10. s <— s + gi (a — vaD(d, 2), b — vals(d, 3), — 1, 

11. return s 
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Example 4. Using Algorithms 1 and 2, we see that among the 2092690 different 
DBCs with a leading factor dividing 2 14 3 10 and representing 542788, there are 
three optimal chains of length 5, 81 chains of length 6, 843 of length 7 , 5005 of 
length 8, 19715 of length 9, 56148 of length 10, and so on. The total number is 
bounded as for instance, there cannot be a DBC of length greater or equal to 26 
since the leading factor is at most 2 14 3 10 . 

2.4 Optimal DBCs 

Using the algorithms described in the previous part, it is simple to determine 
the optimal length of a DBC representing an integer n with a leading factor 
dividing 2 a 3 6 . Simply compute q(a,b,£,n) for increasing values of i ^ 1 until 
a positive cardinality is returned. Also, along with the total number of DBCs, 
it is possible to return the list of all the actual DBCs representing an integer, 
by introducing a few simple modifications in the Algorithms [l] and [2] We note 
that we can further modify Algorithms 1 and 2 so that we compute only the 
DBCs having a specified length. Also, in case we are only interested in finding 
an optimal chain for a given integer n, we can implement a simple early abort 
technique to terminate the search once a DBC of a certain given size has been 
found. This is possible because these algorithms perform a depth-first search. 

Example 5. Among the three optimal DBCs of length 5 with leading factor 
dividing 2 14 3 10 and representing 542788, one is 

2 8 3 7 — 2 6 3 5 — 2 6 3 3 + 2 6 3 + 2 2 . 

The running time of this approach is largely driven by the length of the opti- 
mal chain that is returned. Typically, it takes a few seconds for chains of length 
12 up to a few hours for length 15. In general, it is practical to determine an op- 
timal DBC for integers of size around 60 to 70 bits. See Section l5Tl and Table [1] 
for details including actual experiments and timings of our C++ implementation 
that is available from our homepage, see [8]. 

So it is clear that computing an optimal DBC for a scalar of size around 200 
bits, i.e. the kind of size typically used in elliptic curve cryptography, is com- 
pletely out of reach with this approach. Instead, we consider another approach 
to efficiently perform a random scalar multiplication [n\P. 

3 Enumerating DBCs with Given Parameters 

Instead of computing the number of DBCs representing a given integer n, this 
time we want to count the number of different DBCs with a given leading factor 
2 a 3 6 and a given length I. 

Remark 3. The same problem is straightforward for DBNS expansions. Indeed, 
we see from m that there are 2^(( a+1 h 6+1 )) different expansions of length £ and 
such that maxai = a and max bi = b. Note that all the expansions are different 
in this count, but the integers they represent are not necessarily all different. 

It is more involved to determine the number of unsigned DBCs (see Definition [3]) 
and of DBCs with a given leading factor 2 a 3 6 and a given length £. 
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3.1 First Properties 

Definition 4. Let S^(a, b ) denote the number of unsigned DBCs of length £ with 
a leading factor equal to 2 a 3 b . Let Tg(a , b) denote the number of unsigned DBCs 
of length £ with a leading factor dividing 2 a 3 b . 

Proposition 2. Let £ ^ 1. We have: 

1. Si+i(a,b) =Ti(a,b) - Se(a,b). 

2. 

a b 

Te+i{a, b) = EE [(a — i + 1 ){b — j + 1) — 1 \Si(i,j). 

i = 0 j=0 


3. Si(a,b) andTi(a,b) are both symmetrical polynomials. ^ 

4. The leading terms of Si(a, b) and ofT^a, b) are respectively 


and 


Proof The first three relations are a simple consequence of the definitions of 
Se(a,b) and T^(a, fr). To prove [4j we first note that Si (a, b) is of degree 2£ — 2 
and Ti(a, b) is of degree 2£. This can be shown by induction based on Si (a, b) = 1, 
Ti(a, b) = (a + 1)(6 + 1), and using 1. and 2. We can now prove [4j by induction. 
The property is true for Si(a , b) and Ti(a, b). Also, by 1. and given that SV+i(a, b) 
and T^(a, b) are of the same degree, it is clear that their leading terms are equal. 
So by the induction hypothesis, it is clear that the property holds for S^(a, b), 
for all £ ^ 1. Now assuming it holds for Tg_i(a, 6), let us show that it holds for 
Ti(a, b). Using the induction hypothesis, we observe that the leading term of 


a b 

Tt(a, b) = E E [(° - i + X )( 6 - 3 + !) - !] S e -i(i,j) 

i = 0 j=0 

is equal to the leading term of 


yzWEE^-^-^A 2 - 

^ /* i=0 j=0 

a 

Next, we note that the leading term of is (^ 1 ) a fe+1 - We deduce that the 

leading term of i = 0 


a b 

Ey(a-i)(t-i)(y) f - 2 


(a6) ^ (f 2 + (£-1)2 ^-1)) 

It follows that the leading term of T^(a, 6) is 


(ab) £ 

w-m 2 ' 


□ 


as expected. 
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Remark 4. The number of signed DBCs of length £ with a leading factor equal 
to 2 a 3 b and dividing 2 a 3 b can be easily deduced from Si(a,b) and Ti{a,b), re- 
spectively. Namely, it is only necessary to multiply by a factor 2 £ . Note that all 
those DBCs represent positive and negative integers. But it is easy to see that the 
sign of the integer represented by a chain corresponds to the sign of the largest 
term of the chain. See Lemma Q] in Section So if we are only interested in 
DBCs representing positive values, the multiplication factor between unsigned 
and signed DBCs should be 2 ^ _1 . 


3.2 Explicit Computations 

Recall that Si (a , b) = 1 and Ti(a, b) = (a + l)(b+ 1). Proposition [2] can then be 
used to explicitly determine the polynomials Si and Ti of rank £ ^ 2 recursively. 
For instance, we have S^a, b) = ab + a + b from 1. and X^a, b) = \{ab + 2a + 
2b)T\(a, b) using 2 . We can then compute S 3 (a, b), then T%(a, b), and so on. 

In practice, however, the complexity of those polynomials rapidly grows with 
£ and it becomes quickly impossible to compute them formally. Fortunately, we 
are only interested by the value of these polynomials at a specific pair (ao,bo). 
This can be done very efficiently using some precomputations and Lagrange 
interpolation. Since Si is a polynomial of degree £ — 1 in a and £ — 1 in b, it is 
enough to know the value of Si at £ 2 pairs ( a*,bj ), for (i,j) G [l ,^] 2 in order 
to compute Si(ao,bo). First, for each i G [1,^], we interpolate with respect to 
the second coordinate based on the values Si(ai,bj ), for j G [1,£]. We obtain 
£ polynomials in variable b. Specializing those polynomials at bo, we obtain 
£ values and a second Lagrange interpolation, followed by a specialization at 
ao gives S^(ao,bo). Note that in order to find the Lagrange polynomial P(x ) 
interpolating the points (x/-, /(£/-)), it is faster, in our case, to use the following 
formulas 


i 

P{x) = w(x) 

k=l 


f(Xk) 

w'{x k )(x - x k ) 


c 

with w(x) = — Xj) 

3 = 1 


rather than a more classical approach such as Aitken method. For each length 
£, the £ 2 precomputed values can be obtained with Proposition [2j There is a 
similar approach for evaluating Ti at (ao, bo). 

Our PARI/GP implementation allows to deal efficiently with length £ up 
to 150. For most pairs (a,b), it takes less than 50ms to evaluate Si(a,b) or 
Ti(a,b). In any case, at most a few seconds are necessary. The corresponding 
precomputations require about 45 MB. Only 10 MB are necessary to handle 
lengths £ up to 100 . See [B, to access the actual implementation. 


3.3 Generalization to Multi-Base Chains 

It is easy to generalize the previous results to Multi-Base Chains. Let pi, ... ,Pk 
be k pairwise coprime bases. A Multi- Base Chain (MBC) allows to represent a 


On the Enumeration of Double-Base Chains with Applications 


307 


positive integer n as 

i 

n = ^ Cip “ 1,x . . with ci = 1 and q = ±1, for i> 1 
i= 1 

and dj y i ^ ^ ^ <2jy, for all j E [1, fc]. An unsigned Multi-Base Chain is 

similar to a Multi-Base Chain except that all the cf s are equal to 1. In any case, 
we assume that the term . . .p^ k,t appears at most once in any expansion. 

Definition 5. Let a denote the vector (ai, . . . , a&) and let Si (a) be the number 
of unsigned Multi-Base Chains of length £ satisfying ayi = aj, for all j . Also, 
let Ti(a ) be the number of unsigned Multi- Base Chains of length £ satisfying 
a J? i ^ aj, for all j. 

The following Proposition is a simple generalization of Proposition [2j The proof 
is also similar. 


Proposition 3. Let £ ^ 1. We have 


L Se +1 (a) = Te(a) - Si(a). 

2. 

& 1 Q/g 

T i+i{a) 

ii= 0 ik= 0 


k 


C + 1) 1 

3 = 1 


Si (a). 


3. Si (a) and Ti(a ) are both symmetrical polynomials. ^ i 

4- The leading terms of Si (a) and of Ti(a) are respectively — an g 

(ai...a fc p ^ 

£\ k 


Remark 5. Again the number of MBCs of length £ with a leading factor equal 
or dividing p^ 1 ’ 1 . . .p^’ 1 can be easily deduced from Si (a) or Ti(a). Namely, it 
is only necessary to multiply by a factor 2^ _1 . 


Example 6. For k = 3, we have 


Si ( a ) — (ai + l)(a2 + 1)(&3 + 1) 5 
Ti(a) = 1 , 

S2 (a) = — (aia2<23 + 2aia2 + 2ai<23 + 2*22*23 + 4 ai + 4a2 + 4<23)*Si(a) , 

o 

^(tt) = (21(22(23 + <21(22 + <2i<23 + (22<23 + 0\ + <22 + <23. 


4 Controlled DBC for Scalar Multiplication 

For cryptographic applications, we propose a new way to perform a random 
scalar multiplication based on the concept of controlled DBC . The idea is to 
directly generate a random DBC expansion instead of choosing a random integer 
n and then finding a corresponding DBC to represent it. 
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Definition 6 . Given a leading factor 2 a 3 b and a given length i, the controlled 
DBC approach refers to the generation of a DBC expansion 

i 

Ci2 ai 3 bi , with C{ E { — 1,1} 


such that c\ = 1, a\ = a, b\ = b, and whose £ — 1 remaining terms Ci2 ai 3 bi are 
selected to satisfy a\ ^ ^ ^ a^ and b\ ^ 62 ^ ^ bg. 

This has two main advantages. Although very efficient, the greedy approach 
still requires some time to return a DBC. No conversion is necessary with this 
approach. Furthermore, there is no guarantee that the DBC expansion returned 
by the greedy approach is optimal. In fact, we have evidence that the greedy 
method returns a DBC that is far from optimal in general, especially for large 
integers. See Section lA2l and Figure O By choosing the DBC expansion first, in 
particular its leading factor as well as its length, we can get closer to the average 
optimal length. As a result, we can perform a scalar multiplication faster than 
with the DBC obtained with the greedy approach by saving many additions. This 
approach raises a few questions, in particular, regarding a suitable selection of 
the length. For a given size and a given leading factor, it is possible to estimate 
the length which corresponds heuristically to the average optimal length of a 
DBC representing integers of that size with that leading factor. See Definition [7] 
for the notion of Near Optimal Length. 

First, let us address the range of the integers that can be represented a priori 
with a DBC having a leading term equal to 2 a 3 b . 


4.1 Integer Range 

The following result provides an answer. 

Lemma 1 . Any DBC with leading factor 2 a 3 b belongs to the interval 


3 6 + l 
2 


9 a+lo& 


3 6 + 1 
2 


It follows that the sign of the integer represented by a DBC with leading factor 
equal to 2 a 3 b is driven by the sign of the coefficient of the leading factor in the 


DBC. 


Proof. It is not difficult to see that the largest integer represented with a DBC 
having a leading factor equal to 2 a 3 6 can be constructed with a greedy- type 
approach. In other words, it is enough to pick the largest available term at each 
step to end up with the largest possible integer. Starting from 2 a 3 b , the next 
term in the DBC is of the form 2 2 3 jf with i a, j ^ 6, and (i,j) f 1 (a, 6). 
Assuming that a > 1 , clearly, 2 a-1 3 6 is the largest possible integer we can pick. 
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If a = 0, then there is no choice but to pick 3 6 1 . Repeating this argument, we 
deduce that the largest integer that can be represented is 

a ~ 1 6—1 06 i 

2 a 3 b + Y 2i 3 6 + Y 3J = 2° +1 3 b H 

3=0 j = 0 

Similarly, the smallest integer corresponds to Finally, it is obvious that if 

a DBC starts with — 2 a 3 6 , then the integers that can be represented with this 
DBC belong to the interval 

L 2«+ 1 3 b + A, _*+!]. 

2 2 

So integers represented by a DBC starting with 2 a 3 6 are always positive and 
those represented by a DBC starting with —2 a 3 b are always negative. □ 

The work in Section [3] gives the exact cardinality of the set containing all the 
DBCs with selected parameters. It is then tempting to select a length l giving 
rise to as many DBCs as there are integers in the interval given in Lemma [lj 
However, this is ignoring that in general an integer has many different DBCs 
representations. 


4.2 Redundancy and Near Optimal Length 

In the controlled DBC approach, we need to be careful in selecting the length 
£, as generating DBCs that are not long enough could compromise the security 
of the cryptosystem by severely restricting the number of scalars that can be 
represented with those chains. What length is then long enough? See Definition [7] 
for the notion Near Optimal Length addressing this question. 

For various leading factors up to 2 30 3 10 and length between 1 and 12, we have 
computed the number of different optimal representations of integers having an 
optimal DBC with this particular leading factor and length. For every selection of 
parameters, we consider between 10, 000 and 100 such integers. We then compute 
the average number of optimal DBCs for each length between 1 and 12, taking 
into account all the possible leading factors. This search was carried out with 
the algorithms developed in Section [2j The data fit an exponential regression of 
the form y = exp(0.4717x — 1.1683) with R 2 = 0.9975, see Figure [TJ 

To double-check the relevance of this estimate, we investigate DBCs having a 
leading factor of the form 2 3 ^. We know that in this case the optimal length is 
which corresponds to the NAF. We then compute the number of DBCs with 
a leading factor equal to 3 i and a length equal to £ using what we have done 
in Section [3l Dividing this quantity by 2 3 ^ +1 , which corresponds approximately 
to the number of integers that can be represented a priori, we should obtain an 
estimate of the average number of optimal DBCs representing an integer, i.e. 
something close to exp(0.4717^ — 1.1683). For all i G [10, 100], the ratio between 
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Fig. 1. Curve exp(0.4717x — 1.1683) fitting the experimental data 


these two quantities lies in the interval [0.0974, 3.384]. This tends to confirm the 
relevance of our estimate, at least for relatively small values of b (0 in this case). 

Definition 7. For a leading factor equal to 2 a 3 b ~ 2 l , the Near Optimal Length 
corresponds to the integer value £ minimizing 

2 i ~ 1 Se(a,b) — 2 t [exp(0.4717£ — 1.1683)] . 

Indeed, we expect that the average number of different DBC expansions of length 
t representing the same integer is close to [exp(0.4717^ — 1.1683)]. Heuristi- 
cally, we also expect that this redundancy factor multiplied by 2 t is equal to 
2^ -1 S^(a, b ) for the average optimal length £. 


4.3 Applications to Elliptic Curve Cryptography 

For a chosen coordinate system representing a point on an elliptic curve and the 
corresponding complexities of a doubling, a tripling, and a mixed addition, it is 
possible to determine the optimal parameters, i.e. leading factor 2 a 3 6 and length 
£, which minimize the overall cost of a scalar multiplication with that particular 
coordinate system, without compromising the security of the system. 

Definition 8. For a given coordinate system and a bit size t , the Near Optimal 
Controlled (NOC) DBC method refers to the generation of a Controlled DBC 
with Near Optimal Length, which minimizes the costs of a scalar multiplication. 
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In practice, we first select the bit size £, then consider all the possible pairs (a, b ) 
such that 2 a 3 b ~ 2 t . For each pair (a, 6), we work out the corresponding Near 
Optimal Length £. Then we can compute the overall complexity to perform a 
scalar multiplication based on a controlled DBC with leading factor 2 a 3 6 and 
length i. It is then a matter of selecting the pair (a, b ) corresponding to the 
lowest complexity overall. See Figure [2] and Tables [2] and [3j 

5 Experiments 

We have implemented the work described in Section[3]in C++ using NTL 6.0.0 [21] 
built on top of GMP 5.1.2 [15] . The approach described in Section[4]is implemented 
in PARI/GP 2.7.1 [22]. See [3] to access the actual C++ and PARI/GP implementa- 
tions. All the programs are executed on a quad core i7-2620 at 2 . 70Ghz. 

5.1 Optimal DBC Search 

Given an integer n, the running time of Algorithms 1 and 2 to find the optimal 
length of a DBC representing n with a leading factor dividing 2 a 3 6 is largely 
driven by the length £ of this optimal expansion. It usually takes several minutes 
for DBCs of length 14. See Table [T] 


Table 1 . Average running times to find an optimal DBC of length £ 


Length £ 

9 

10 

11 

12 

13 

14 

Time in s 

1.08 

5.21 

28.52 

66.38 

214.80 

757.91 


Considering integers related to i r, the longest optimal DBC that we have been 
able to compute corresponds to the 69-bit integer 314159265358979323846 with 
a leading factor equal to 2 38 3 19 and length 18. It takes about 22 hours to show 
that there is no expansion of length less than or equal to 17 and it takes a bit less 
than six hours to return an optimal expansion of length 18 with the early abort 
technique mentioned in Section 12.41 Interestingly, the greedy approach returns 
a DBC of length 18 so that we can obtain an optimal DBC in no time, in that 
particular case. 


5.2 Comparison between Greedy and Near Optimal Length 

We have run some tests for sizes 192, 256, 320, 384, 448, and 512 bits. For 
each size £, we have considered various leading factors of the form 2 a 3 6 ~ 2 t . 
More precisely, we fix a between t / 2 and £, compute the corresponding 5, and 
then compute the average length of the DBCs returned by the greedy method 
for 5,000 random integers. We also compute the Near Optimal Length of a 
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DBC with leading factor equal to 2 a 3 6 , see Definition [7| in Section 14.21 Our 
computations indicate that considering controlled DBCs that are 20 to 30% 
shorter than those returned by the greedy algorithm should not significantly 
reduce the set of integers that can be represented. See Figure [2j which shows a 
comparison for size t = 320. The x-coordinate axis corresponds to a between 160 
and 315. The //-coordinate axis corresponds to the average length of the DBCs. 



160 180 200 220 240 260 280 300 


Fig. 2. Comparison between the average length of the DBCs returned by the greedy 
method and the Near Optimal Length for size 320 bits 

5.3 Scalar Multiplication 

In this part, we are interested in the potential savings introduced by our new 
scalar multiplication framework described in Section [4j in particular using the 
notion of Near Optimal Controlled DBC, see Definition [H 

In the following, we select the Inverted Edwards coordinate system [4] for a 
curve defined over a large prime field F p . This system offers a very fast doubling 
and a reasonably cheap mixed addition and tripling [2] . More precisely, the respec- 
tive costs of a doubling, mixed addition, and tripling are 3M + 4S, 8M + S, and 
9M + 4S, where M and S stand respectively for a multiplication and a squaring in 
F p . To allow easy comparisons and as customary, we assume that S = 0.8M. 

Until now, computing [n\P for a random n, in Inverted Edwards coordinates 
with a DBC was not really worth it. Indeed, only the greedy method was fast 
enough to return a DBC in a reasonable time and the overall savings obtained 
were marginal with respect to the NAF, whose recoding can be achieved much 
faster. With the NAF, we perform t doublings and approximately t / 3 mixed 
additions in order to compute [n\P where n is of size t bits. 
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In Table [2j we display the parameters, costs, and speedups corresponding to 
different methods, for various sizes between 192 and 512. First, we consider the 
Near Optimal Controlled DBC approach, then the greedy method, and finally the 
NAF. LF stands for the leading factor and £ is the length of the corresponding 
expansion. The costs are expressed in terms of the number of multiplications 
needed to compute [n]P but do not take into account the effort to produce each 
expansion. Regarding the NOC DBC, we determine for each size the optimal 
leading factor 2 a 3 b and corresponding Near Optimal Length £ minimizing the 
costs of the scalar multiplication, as explained in Section 15.21 Similarly, for the 
greedy approach we rely on the computations of Section 15.21 


Table 2. Theoretical comparison between NOC, greedy, and NAF methods 


Size 

NOC 

LF £ Cost 

Greedy 

LFi £1 Costi 

NAF 

LF 2 £2 Cost 2 

Speedups 

Si S 2 

192 

2 151 3 26 37 1570.20 

2 lle 3 48 44.63 1688.74 

2 192 64.00 1744.80 

7.02% 10.01% 

256 

2 198 3 37 4 8 2092.60 

2 153 3 65 58.73 2249.62 

2 256 85.33 2 3 29.33 

6.98% 10.16% 

320 

2 26 ° 3 38 62 2612.40 

2 i 8 ° 3 89 70 80 2816.04 

2 320 106.67 2913.87 

7.23% 10.35% 

384 

2 297 3 55 n 31 28.40 

2 2i7 3 io6 g4 74 3375 51 

2 384 128.00 3498.40 

7.32% 10.58% 

448 

2 369 3 50 8 6 3 6 45.80 

2 254gl23 gg 73 3935 42 

2 448 149.33 4082.93 

7.36% 10.71% 

512 

2 406 3 67 95 4161.80 

2 286 3 i43 112 07 4495.22 

2 512 170.67 4667.47 

7.42% 10.83% 


To validate these theoretical results, we have developed an implementation 
in C++ using NTL 6.0.0 [21] built on top of GMP 5.1.2 [15] . The program is 
compiled and executed on a quad core i7-2620 at 2.70Ghz. For t = 192, 256, 
320, 384, 448, and 512, we generate a random prime number p t having bit size 
t. For each p t , we then create a total of 100 curves of the form 

E : x 2 + y 2 = c 2 (l + dx 2 y 2 ) 

defined over F Pt , where c and d are small random values. For each curve E, we 
determine a random point P on E. Next, we select 100 random scalars in the 
interval [0 ,pt — 1]. The corresponding NAF and greedy DBC expansions with 
a leading factor as in Table [2] are then computed for each scalar. For each £, 
we also directly create 100 random DBC expansions of length i returned by 
the controlled DBC approach. Since we only want to assess the efficiency of the 
scalar multiplication, our only constraint is to generate a DBC with the specified 
length £ and leading factor as in Table[2] In practice, the method used to generate 
the expansions should be thoroughly designed and analyzed to ensure that the 
integers that are produced are uniformly distributed. This will be the object of 
some future work. 

The experiments confirm the theoretical complexity analysis provided in Ta- 
ble [2] especially regarding S 2 • The discrepancy between the theoretical and the 
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experimental values of Si can be explained by a ratio M/S that is closer to 0.95 
in NTL rather than 0.8 as initially assumed. 

See Table [3] for actual timings. Note that the respective times necessary to 
compute the expansions for each method are not counted. 

Table 3. Comparison of running times of NOC, greedy, and NAF methods 


Size 

NOC 

Time in ms 

Greedy 

Time in ms 

NAF 

Time in ms 

Speedups 

Si S 2 

192 

0.822 

0.861 

0.939 

4.58% 12.49% 

256 

1.444 

1.531 

1.642 

5.73% 12.08% 

320 

2.446 

2.584 

2.766 

5.35% 11.58% 

384 

3.511 

3.703 

3.960 

5.17% 11.33% 

448 

5.088 

5.392 

5.729 

5.65% 11.20% 

512 

6.569 

6.982 

7.408 

5.91% 11.32% 


6 Conclusion and Future Work 

In this article, we have introduced new techniques to compute an optimal DBC 
representing a given integer. The algorithms that we have developed allow to 
tackle sizes of around 60 to 70 bits in a reasonable time. 

We have also developed a new way to produce DBCs, namely the controlled 
DBC approach, which allows to directly create a DBC expansion instead of 
selecting an integer and converting it to DBC format. This idea raises a few issues 
regarding the choice of parameters, in particular the length of the expansion. 

We use heuristics to estimate the average length of an optimal DBC expansion 
representing an integer of a certain bit size with a given leading factor. This 
estimate is based on the enumeration of the DBCs with given parameters and 
the expected number of different optimal DBCs representing the same integer. 

For a given size and coordinate system, these heuristics allow to determine the 
optimal parameters, i.e. leading factor and length, which minimize the overall 
costs of a scalar multiplication of that size. This gives rise to the concept of Near 
Optimal Controlled DBC. Our experiments show speedups for this approach in 
excess of 10% over the NAF and of about 5% over the greedy method. Those 
computations do not take into account the time necessary to produce the ex- 
pansions. So the interest of this new method is even greater as the expansions 
do not have to be computed unlike for the greedy and NAF methods. 

In future, we aim at studying the redundancy of DBCs more accurately in 
order to find an upper bound on the number of DBCs of a certain length, rep- 
resenting an integer of a certain size. 
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Also, given a leading factor, once we have an estimate of the length of the 
expansion, the problem remains to actually create random controlled DBC ex- 
pansions, such that the corresponding integers are uniformly distributed. 

This question is not addressed in the present paper and will be the object of 
some future work. 
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Abstract. This paper sets new speed records for high-security constant- 
time variable-base-point Difhe-Hellman software: 305395 Cortex- A8-slow 
cycles; 273349 Cortex-A8-fast cycles; 88916 Sandy Bridge cycles; 88448 
Ivy Bridge cycles; 54389 Haswell cycles. There are no higher speeds in 
the literature for any of these platforms. 

The new speeds rely on a synergy between (1) state-of-the-art for- 
mulas for genus-2 hyperelliptic curves and (2) a modern trend towards 
vectorization in CPUs. The paper introduces several new techniques for 
efficient vectorization of Kummer-surface computations. 

Keywords: performance, Diffie-Hellman, hyperelliptic curves, Kummer 
surfaces, vectorization. 


1 Introduction 

The Eurocrypt 2013 paper “Fast cryptography in genus 2” by Bos, Costello, 
Hisil, and Lauter [17] reported 117000 cycles on Intel’s Ivy Bridge microarchi- 
tecture for high-security constant-time scalar multiplication on a genus-2 Kum- 
mer surface. The eBACS site for publicly verifiable benchmarks [13] confirms 
119032 “cycles to compute a shared secret” (quartiles: 118904 and 119232) for 
the kumfpl27g software from [17] measured on a single core of h9ivy, a 2012 
Intel Core i5-3210M running at 2.5GHz. The software is not much slower on 
Intel’s previous microarchitecture, Sandy Bridge: eBACS reports 122716 cycles 
(quartiles: 122576 and 122836) for kumfpl27g on h6sandy, a 2011 Intel Core 
i3-2310M running at 2.1GHz. (The quartiles demonstrate that rounding to a 
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multiple of 1000 cycles, as in [17], loses statistically significant information; we 
follow eBACS in reporting medians of exact cycle counts.) 

The paper reported that this was a “new software speed record” (“breaking 
the 120k cycle barrier”) compared to “all previous genus 1 and genus 2 imple- 
mentations” of high-security constant-time scalar multiplication. Obviously the 
genus-2 cycle counts shown above are better than the (unverified) claim of 137000 
Sandy Bridge cycles by Longa and Sica in [40] (Asiacrypt 2012) for constant- 
time elliptic-curve scalar multiplication; the (unverified) claim of 153000 Sandy 
Bridge cycles by Hamburg in [34] for constant-time elliptic-curve scalar mul- 
tiplication; the 182708 cycles reported by eBACS on h9ivy for curve25519, a 
constant-time implementation by Bernstein, Duif, Lange, Schwabe, and Yang 
[11] (CHES 2011) of Bernstein’s Curve25519 elliptic curve [9]; and the 194036 
cycles reported by eBACS on h6sandy for curve25519. 

One might conclude from these figures that genus-2 hyperelliptic-curve cryp- 
tography (HECC) solidly outperforms elliptic-curve cryptography (ECC). How- 
ever, two newer papers claim better speeds for ECC, and a closer look reveals a 
strong argument that HECC should have trouble competing with ECC. 

The first paper, [44] by Oliveira, Lopez, Aranha, and Rodrfguez-Henrfquez 
(CHES 2013 best-paper award), is the new speed leader in eBACS for non- 
constant-time scalar multiplication; the paper reports a new Sandy Bridge speed 
record of 69500 cycles. Much more interesting for us is that the paper claims 
114800 Sandy Bridge cycles for constant-time scalar multiplication, beating [17]. 
eBACS reports 119904 cycles, but this is still faster than [17]. 

The second paper, [24] by Faz-Hernandez, Longa, and Sanchez, claims 92000 
Ivy Bridge cycles or 96000 Sandy Bridge cycles for constant-time scalar mul- 
tiplication; a July 2014 update of the paper claims 89000 Ivy Bridge cycles or 
92000 Sandy Bridge cycles. These claims are not publicly verifiable, but if they 
are even close to correct then they are faster than [17]. 

Both of these new papers, like [40], rely heavily on curve endomorphisms 
to eliminate many doublings, as proposed by Gallant, Lambert, and Vanstone 
[27] (Crypto 2001), patented by the same authors, and expanded by Galbraith, 
Lin, and Scott [26] (Eurocrypt 2009). Specifically, [44] uses a GLS curve over 
a binary field to eliminate 50% of the doublings, while also taking advantage of 
Intel’s new pclmulqdq instruction to multiply binary polynomials; [24] uses a 
GLV+GLS curve over a prime field to eliminate 75% of the doublings. 

One can also use the GLV and GLS ideas in genus 2, as explored by Bos, 
Costello, Hisil, and Lauter starting in [17] and continuing in [18] (CHES 2013). 
However, the best GLV/GLS speed reported in [18], 92000 Ivy Bridge cycles, 
provides only 2 105 security and is not constant time. This is less impressive than 
the 119032 cycles from [17] for constant-time DH at a 2 125 security level, and 
less impressive than the reports in [44] and [24] . 

The underlying problem for HECC is easy to explain. All known HECC ad- 
dition formulas are considerably slower than the state-of-the-art ECC addition 
formulas at the same security level. Almost all of the HECC options explored in 
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[17] are bottlenecked by additions, so they were doomed from the outset, clearly 
incapable of beating ECC. 

The one exception is that HECC provides an extremely fast ladder (see Sec- 
tion 2), built from extremely fast differential additions and doublings, consider- 
ably faster than the Montgomery ladder frequently used for ECC. This is why 
[ 17 ] was able to set DH speed records. 

Unfortunately, differential additions do not allow arbitrary addition chains. 
Differential additions are incompatible with standard techniques for removing 
most or all doublings from fixed-base-point single-scalar multiplication, and 
with standard techniques for removing many doublings from multi-scalar mul- 
tiplication. As a consequence, differential additions are incompatible with the 
GLV+GLS approach mentioned above for removing many doublings from single- 
scalar multiplication. This is why the DH speeds from [ 17 ] were quickly super- 
seded by DH speeds using GLV+GLS. A recent paper [ 22 ] (Eurocrypt 2014) by 
Costello, Hisil, and Smith shows feasibility of combining differential additions 
and use of endomorphisms but reports 145000 Ivy Bridge cycles for constant- 
time software, much slower than the papers mentioned above. 

1.1. Contributions of This Paper. We show that HECC has an important 
compensating advantage, and we exploit this advantage to achieve new DH speed 
records. The advantage is that we are able to heavily vectorize the HECC ladder. 

CPUs are evolving towards larger and larger vector units. A low-cost low- 
power ARM Cortex- A8 CPU core contains a 128-bit vector unit that every two 
cycles can compute two vector additions, each producing four sums of 32-bit 
integers, or one vector multiply- add, producing two results of the form ab + c 
where a, b are 32-bit integers and c is a 64-bit integer. Every cycle a Sandy 
Bridge CPU core can compute a 256-bit vector floating-point addition, producing 
four double-precision sums, and at the same time a 256-bit vector floating-point 
multiplication, producing four double-precision products. A new Intel Haswell 
CPU core can carry out two 256-bit vector multiply-add instructions every cycle. 
Intel has announced future support for 512-bit vectors (“AVX-512”). 

Vectorization has an obvious attraction for a chip manufacturer: the costs 
of decoding an instruction are amortized across many arithmetic operations. 
The challenge for the algorithm designer is to efficiently vectorize higher-level 
computations so that the available circuitry is performing useful work during 
these computations rather than sitting idle. What we show here is how to fit 
HECC with surprisingly small overhead into commonly available vector units. 
This poses several algorithmic challenges, notably to minimize the permutations 
required for the Hadamard transform (see Section 4). We claim broad applica- 
bility of our techniques to modern CPUs, and to illustrate this we analyze all 
three of the microarchitectures mentioned in the previous paragraph. 

Beware that different microarchitectures often have quite different perfor- 
mance. A paper that advertises a “better” algorithmic idea by reporting new 
record cycle counts on a new microarchitecture, not considered in the previ- 
ous literature, might actually be reporting an idea that loses performance on 
all microarchitectures. We instead emphasize HECC performance on the widely 
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deployed Sandy Bridge microarchitecture, since Sandy Bridge was shared as a 
target by the recent ECC speed-record papers listed above. We have now set 
a new Sandy Bridge DH speed record, demonstrating the value of vectorized 
HECC. We have also set DH speed records for Ivy Bridge, Haswell, and Cortex- 
A8. 


1.2. Constant Time: Importance and Difficulty. See full version of this 
paper online at https : / / eprint . iacr . org/2014/ 134. 

1.3. Performance Results. eBACS shows that on a single core of h6sandy 
our DH software (“kummer”) uses just 88916 Sandy Bridge cycles (quartiles: 
88868 and 89184). On a single core of h9ivy our software uses 88448 cycles 
(quartiles: 88424 and 88476). On a single core of titanO, an Intel Xeon E3-1275 
V3 (Haswell), our software uses 54389 cycles (quartiles: 54341 and 54454). On 
h7beagle, a TI Sitara AM3359 (Cortex- A8-slow), our software uses 305395 cy- 
cles (quartiles: 305380 and 305413). On h4mx515e, a Freescale i.MX515 (Cortex- 
A8-fast), our software uses 273349 cycles (quartiles: 273337 and 273387). 

1.4. Cycle-Count Comparison. Table 1.5 summarizes reported high-security 
DH speeds for Cortex- A8, Sandy Bridge, Ivy Bridge, and Haswell. 

This table is limited to software that claims to be constant time, and that 
claims a security level close to 2 128 . This is the reason that the table does not 
include, e.g., the 767000 Cortex- A8 cycles and 108000 Ivy Bridge cycles claimed 
in [18] for constant-time scalar multiplication on a Kummer surface; the authors 
claim only 103 bits of security for that surface. This is also the reason that the 
table does not include, e.g., the 69500 Sandy Bridge cycles claimed in [44] for 
non-constant-time scalar multiplication. 

The table does not attempt to report whether the listed cycle counts are 
from software that actually meets the above security requirements. In some cases 
inspection of the software has shown that the security requirements are violated; 
see Section 1.2. “Open” means that the software is reported to be open source, 
allowing third-party inspection. 

Our speeds, on the same platform targeted in [17], solidly beat the HECC 
speeds from [17]. Our speeds also solidly beat the Cortex- A8, Sandy Bridge, and 
Ivy Bridge speeds from all available ECC software, including [11], [15], [22], 
and [44]; solidly beat the speeds claimed in [34] and [40]; and are even faster 
than the July 2014 Sandy Bridge/Ivy Bridge DH record claimed in [24], namely 
92000/89000 cycles using unpublished software for GLV+GLS ECC. For Haswell, 
despite Haswell’s exceptionally fast binary-field multiplier, our speeds beat the 
55595 cycles from [44] for a GLS curve over a binary field. We set our new speed 
records using an HECC ladder that is conceptually much simpler than GLV and 
GLS, avoiding all the complications of scalar-dependent precomputations, lattice 
size issues, multi-scalar addition chains, endomorphism-rho security analysis, 
Weil-descent security analysis, and patents. 
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Table 1.5. Reported high-security DH speeds for Cortex- A8, Sandy Bridge, Ivy Bridge, 
and Haswell. Cycle counts from eBACS are for curve25519, kumfpl27g, gls254prot, 
and our kummer on h7beagle (Cortex-A8-slow), h4mx515e (Cortex- A8-fast), h6sandy 
(Sandy Bridge), h9ivy (Ivy Bridge), and titanO (Haswell). Cycle counts not from 
SUPERCOP are marked “?”. ECC has g — 1; genus-2 HECC has g = 2. See text for 
security requirements. 


arch 

cycles 
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open 

9 
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A8-slow 
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CHES 2012 
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2 Fast Scalar Multiplication on the Kummer Surface 

This section reviews the smallest number of field operations known for genus-2 
scalar multiplication. Sections 3 and 4 optimize the performance of those field 
operations using 4- way vector instructions. 

Vectorization changes the interface between this section and subsequent sec- 
tions. What we actually optimize is not individual field operations, but rather 
pairs of operations, pairs of pairs, etc., depending on the amount of vectorization 
available from the CPU. Our optimization also takes advantage of sequences of 
operations such as the output of a squaring being multiplied by a small con- 
stant. What matters in this section is therefore not merely the number of field 
multiplications, squarings, etc., but also the pattern of those operations. 

2.1. Only 25 Multiplications. Almost thirty years ago Chudnovsky and Chud- 
novsky wrote a classic paper [21] optimizing scalar multiplication inside the 
elliptic-curve method of integer factorization. At the end of the paper they 
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(a) 10M + 9S + 6m ladder formulas. (b) 7M + 12S + 9m ladder formulas. 

Fig. 2.2. Ladder formulas for the Kummer surface. Inputs are X(Q — P) — ( x\ : 
yi : zi : ti), X(P) = (x 2 : y 2 : z 2 : t 2 ), and X(Q) = (x 3 : y 3 : z 3 : t 3 ); outputs are 
X(2P) = (x 4 : y 4 : £4 : t^) and X(P + Q) = (x 3 : y 3 : z 3 : t$). Formulas in (a) are from 
Gaudry [30]; diagrams are copied from Bernstein [10]. 


also considered the performance of scalar multiplication on Jacobian varieties 
of genus-2 hyperelliptic curves. After mentioning various options they gave some 
details of one option, namely scalar multiplication on a Kummer surface. 

A Kummer surface is related to the Jacobian of a genus-2 hyperelliptic curve in 
the same way that x-coordinates are related to a Weierstrass elliptic curve. There 
is a standard rational map X from the Jacobian to the Kummer surface; this map 
satisfies X(P) = X{—P) for points P on the Jacobian and is almost everywhere 
exactly 2-to-l. Addition on the Jacobian does not induce an operation on the 
Kummer surface (unless the number of points on the surface is extremely small), 
but scalar multiplication P i — nP on the Jacobian induces scalar multiplication 
X(P) i — y X(nP ) on the Kummer surface. Not every genus-2 hyperelliptic curve 
can have its Jacobian mapped to the standard type of Kummer surface over the 
base field, but a noticeable fraction of curves can; see [31]. 

Chudnovsky and Chudnovsky reported 14M for doubling a Kummer- surface 
point, where M is the cost of field multiplication; and 23M for “general addi- 
tion”, presumably differential addition, computing X(Q + P) given X(P),X(Q), 
X(Q — P). They presented their formulas for doubling, commenting on a “pretty 
symmetry” in the formulas and on the number of multiplications that were ac- 
tually squarings. They did not present their formulas for differential addition. 

Two decades later, in [30], Gaudry reduced the total cost of differential addi- 
tion and doubling, computing X(2 P),X(Q + P) given X(P),X(Q),X(Q — P), 
to 25M, more precisely 16M + 9S, more precisely 10M + 9S + 6m, where S is 
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X 2 2/2 Z 2 t 2 X 3 2/3 Z 3 t 3 


(a) 10M + 9S + 6m ladder formulas. 
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(b) 7M + 12S + 9m ladder formulas. 


Fig. 2.4. Ladder formulas for the squared Kummer surface. Compare to Figure 2.2. 


the cost of field squaring and m is the cost of multiplication by a curve constant. 
An Gbit scalar-multiplication ladder therefore costs just 10£M + 9 ^S + 6 £m. 

Gaudry’s formulas are shown in Figure 2 . 2 (a). Each point on the Kummer 
surface is expressed projectively as four field elements (x : y : z : t)\ one is free 
to replace (x : y : z : t) with (rx : ry : rz : rt ) for any nonzero r. The “IF” 
boxes are Hadamard transforms, each using 4 additions and 4 subtractions; see 
Section 4. The Kummer surface is parametrized by various constants (a : b : c : d) 
and related constants ( A 2 : B 2 : C 2 : D 2 ) = H (a 2 : b 2 : c 2 : d 2 ). The doubling 
part of the diagram, from (x2 : 1/2 • ^2 • ^2) down to (x± : 2/4 : £4 : £4), uses 
3M + 5S + 6m, matching the 14M reported by Chudnovsky and Chudnovsky; 
but the rest of the picture uses just 7M + 4S extra, making remarkable reuse 
of the intermediate results of doubling. Figure 2 . 2 (b) replaces 10M + 9S + 6m 
with 7M + 12S + 9m, as suggested by Bernstein in [10]; this saves time if m is 
smaller than the difference M — S. 

2.3. The Original Kummer Surface vs. The Squared Kummer Surface. 

Chudnovsky and Chudnovsky had actually used slightly different formulas for 
a slightly different surface, which we call the “squared Kummer surface” . Each 
point (x : y : z : t) on the original Kummer surface corresponds to a point 
(. x 2 : y 2 : z 1 : t 2 ) on the squared Kummer surface. Figure 2.4 presents the 
equivalent of Gaudry’s formulas for the squared Kummer surface, relabeling 
(, x 2 : y 2 : z 2 : t 2 ) as (x : y : z : t); the squarings at the top of Figure 2.2 have 
moved close to the bottom of Figure 2 . 4 . 

The number of field operations is the same either way, as stated in [10] 
with credit to Andre Augustyniak. However, the squared Kummer surface has a 
computational advantage over the original Kummer surface, as pointed out by 
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Bernstein in [ 10 ]: constructing surfaces in which all of a 2 , 6 2 , c 2 , d 2 , A 2 , B 2 , C 2 , D 2 
are small, producing fast multiplications by constants in Figure 2.4, is easier than 
constructing surfaces in which all of a, 6, c, d, A 2 , B 2 , C 2 , D 2 are small, producing 
fast multiplications by constants in Figure 2.2. 

2.5. Preliminary Comparison to ECC. A Montgomery ladder step for ECC 
costs 5MH-4S + lm, while a ladder step on the Kummer surface costs 10M+9S + 
6m or 7M + 12S + 9m. Evidently ECC uses only about half as many operations. 
However, for security ECC needs primes around 256 bits (such as the convenient 
prime 2 255 — 19), while the Kummer surface can use primes around 128 bits 
(such as the even more convenient prime 2 127 — 1), and presumably this saves 
more than a factor of 2. 

Several years ago, in [ 10 ], Bernstein introduced 32-bit Intel Pentium M soft- 
ware for generic Kummer surfaces (i.e., m = M) taking about 10% fewer cycles 
than his Curve25519 software, which at the time was the speed leader for ECC. 
Gaudry, Houtmann, and Thome, as reported in [32, comparison table], intro- 
duced 64-bit software for Curve25519 and for a Kummer surface; the second 
option was slightly faster on AMD Opteron K8 but the first option was slightly 
faster on Intel Core 2. It is not at all clear that one can reasonably extrapolate 
to today’s CPUs. 

Bernstein’s cost analysis concluded that HECC could be as much as 1.5 x 
faster than ECC on a Pentium M (cost 1355 vs. cost 1998 in [ 10 , page 31]), 
depending on the exact size of the constants a 2 , 6 2 , c 2 , d 2 , A 2 , B 2 , C 2 , D 2 . This 
motivated a systematic search through small constants to find a Kummer surface 
providing high security and high twist security. But this was more easily said 
than done: genus-2 point counting was much more expensive than elliptic-curve 
point counting. 

2.6. The Gaudry— Schost Kummer Surface. Years later, after a 1000000- 

CPU-hour computation relying on various algorithmic improvements to genus-2 
point counting, Gaudry and Schost announced in [33] that they had found a 
secure Kummer surface (a 2 : b 2 : c 2 : d 2 ) = (11 : —22 : — 19 : —3) over F p with 
p = 2 127 — 1. This is exactly the surface that was used for the HECC speed 
records in [17]. We obtain even better speeds for the same surface. 

Note that, as mentioned by Bos, Costello, Hisil, and Lauter in [ 17 ], the con- 
stants (1 : a 2 /b 2 : a 2 / c 2 : a 2 / d 2 ) = (1 : —1/2 : —11/19 : —11/3) in Figure 2.4 
are projectively the same as (—114 : 57 : 66 : 418). The common factor 11 
between a 2 = 11 and b 2 = —22 helps keep these integers small. The constants 
(1 : A 2 / B 2 : A 2 /C 2 : A 2 / D 2 ) = (1 : — 3 : —33/17 : —33/49) are projectively the 
same as (—833 : 2499 : 1617 : 561). 

3 Decomposing Field Multiplication 

The only operations in Figures 2.2 and 2.4 are the H boxes, which we analyze 
in Section 4, and field multiplications, which we analyze in this section. Our 
goal here is to obtain the smallest possible number of CPU cycles for M, S, 
etc. modulo p = 2 127 — 1. 
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This prime has been considered before, for example in [8] and [10]. What is 
new here is fitting arithmetic modulo this prime, for the pattern of operations 
shown in Figure 2.4, into the vector abilities of modern CPUs. There are four 
obvious dimensions of vectorizability: 

• Vectorizing across the “limbs” that represent a field element such as X 2 . The 
most obvious problem with this approach is that, when / is multiplied by 
g , each limb of / needs to communicate with each limb of g and each limb 
of output. A less obvious problem is that the optimal number of limbs is 
CPU-dependent and is usually nonzero modulo the vector length. Each of 
these problems poses a challenge in organizing and reshuffling data inside 
multiplications. 

• Vectorizing across the four field elements that represent a point. All of the 
multiplications in Figure 2.4 are visually organized into 4- way vectors, except 
that in some cases the vectors have been scaled to create a multiplication 
by 1. Even without vectorization, most of this scaling is undesirable for 
any surface with small a 2 ,6 2 ,c 2 ,d 2 : e.g., for the Gaudry-Schost surface we 
replace (1 : a 2 /b 2 : a 2 /c 2 : a 2 /d 2 ) with (—114 : 57 : 66 : 418). The only 
remaining exception is the multiplication by 1 in (1 : X\jy\ : x\/z\ : 
where X(Q — P) = {pc\ : y± : z\ : U). Vectorizing across the four field 
elements means that this multiplication costs 1M, increasing the cost of a 
ladder step from 7M + 12S + 12m to 8M + 12S + 12m. 

• Vectorizing between doubling and differential addition. For example, in Fig- 
ure 2.4(b), squarings are imperfectly paired with multiplications on the third 
line; multiplications by constants are perfectly paired with multiplications 
by the same constants on the fourth line; squarings are perfectly paired with 
squarings on the sixth line; and multiplications by constants are imperfectly 
paired with multiplications by inputs on the seventh line. There is some loss 
of efficiency in, e.g., pairing the squaring with the multiplication, since this 
prohibits using faster squaring methods. 

• Vectorizing across a batch of independent scalar-multiplication inputs, in ap- 
plications where a suitably sized batch is available. This is relatively straight- 
forward but increases cache traffic, often to problematic levels. In this paper 
we focus on the traditional case of a single input. 

The second dimension of vectorizability is, as far as we know, a unique feature 
of HECC, and one that we heavily exploit for high performance. 

For comparison, one can try to vectorize the well-known Montgomery ladder 
for ECC [42] across the field elements that represent a point, but (1) this provides 
only two-way vectorization (x and z\ not four- way vectorization; and (2) many of 
the resulting pairings are imperfect. The Montgomery ladder for Curve25519 was 
vectorized by Costigan and Schwabe in [23] for the Cell, and then by Bernstein 
and Schwabe in [15] for the Cortex- A8, but both of those vectorizations had 
substantially higher overhead than our new vectorization of the HECC ladder. 

3.1. Sandy Bridge Floating-Point Units. The only fast multiplier available 
on Intel’s 32-bit platforms for many years, from the original Pentium twenty 
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years ago through the Pentium M, was the floating-point multiplier. This was 
exploited by Bernstein for cryptographic computations in [8], [9], etc. 

The conventional wisdom is that this use of floating-point arithmetic was 
rendered obsolete by the advent of 64-bit platforms: in particular, Intel now 
provides a reasonably fast 64-bit integer multiplier. However, floating-point units 
have also become more powerful; evidently Intel sees many applications that rely 
critically upon fast floating-point arithmetic. We therefore revisit Bernstein’s 
approach, with the added challenge of vectorization. 

We next describe the relevant features of the Sandy Bridge; see [25] for more 
information. Our optimization of HECC for the Sandy Bridge occupies the rest 
of Sections 3 and 4. The Ivy Bridge has the same features and should be expected 
to produce essentially identical performance for this type of code. The Haswell 
has important differences and is analyzed in Appendix B online; the Cortex- A8 
is analyzed in Section 5. 

Each Sandy Bridge core has several 256-bit vector units operating in parallel 
on vectors of 4 double-precision floating-point numbers: 

• “Port 0” handles one vector multiplication each cycle, with latency 5. 

• Port 1 handles one vector addition each cycle, with latency 3. 

• Port 5 handles one permutation instruction each cycle. The selection of per- 
mutation instructions is limited and is analyzed in detail in Section 4. 

• Ports 2, 3, and 4 handle vector loads and stores, with latency 4 from LI 
cache and latency 3 to LI cache. Load/store throughput is limited in various 
ways, never exceeding one 256-bit load per cycle. 

Recall that a double-precision floating-point number occupies 64 bits, including 
a sign bit, a power of 2, and a “mantissa”. Every integer between — 2 53 and 2 53 
can be represented exactly as a double-precision floating-point number. More 
generally, every real number of the form 2 e i, where e is a small integer and i is an 
integer between — 2 53 and 2 53 , can be represented exactly as a double-precision 
floating-point number. The computations discussed here do not approach the 
lower or upper limits on e, so we do not review the details of the limits. 

Our final software uses fewer multiplications than additions, and fewer per- 
mutations than multiplications. This does not mean that we were free to use 
extra multiplications and permutations: if multiplications and permutations are 
not finished quickly enough then the addition unit will sit idle waiting for input. 
In many cases, noted below, we have the flexibility to convert multiplications to 
additions, reducing latency; we found that in some cases this saved time despite 
the obvious addition bottleneck. 

3.2. Optimizing M (Field Multiplication). We decompose an integer / mod- 
ulo 2 127 — 1 into six floating-point limbs in (non-integer) radix 2 127 / 6 . This means 
that we write / as /o + /i + + /a + /s where /o is a small multiple of 2°, /i 

is a small multiple of 2 22 , is a small multiple of 2 43 , is a small multiple of 2 64 , 
/4 is a small multiple of 2 85 , and is a small multiple of 2 106 . (The exact mean- 
ing of “small” is defined by a rather tedious, but verifiable, collection of bounds 
on the floating-point numbers appearing in each step of the program. It should 
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be obvious that a simpler definition of “small” would compromise efficiency; for 
example, H cannot be efficient unless the bounds on H intermediate results and 
outputs are allowed to be larger than the bounds on H inputs.) 

If g is another integer similarly decomposed as go + gi + 92 + <73 + #4 + g$ 
then f 0 g 0 is a multiple of 2°, f 0 g r + /ip 0 is a multiple of 2 22 , f 0 g 2 + figi + f 2 go 
is a multiple of 2 43 , etc. Each of these sums is small enough to fit exactly in a 
double-precision floating-point number, and the total of these sums is exactly 
fg. What we actually compute are the sums 


h 0 

II 

0 

+ 2 

127 n 

figs 

+ 2 

127 /294 + 2 

127 n 

fsgs 

+ 2 

127 n 

hg? 

+ 2 

127 /> 

fsgi , 

h 1 

II 

S 5 

+ 

hgo 

+ 2 

127 /255 + 2 

127 f394 

+ 2 

127 hg 3 

+ 2 

127 /5ff2, 

h 2 

(M 

II 

+ 

f 191 

+ 

/250 + 2 

127 /355 

+ 2 

127 fm 

+ 2 

127 /5fl3, 

h 3 

= fogs 

+ 

fi92 

+ 

/291 + 

/ 3go 

+ 2 

127 n 

hgs 

+ 2 

127 /> 

75^4, 

} 14 

II 

+ 

f 193 

+ 

hgi + 

f 391 

+ 

hgo 

+ 2 

127 /> 

fsgs, 

h 5 

= fogs 

+ 

hg 4 

+ 

/2<?3 + 

f 3S2 

+ 

hgi 

+ 

hgo, 

whose total h is congruent 

to fg modulo 2 

127 _ ^ 






There are 36 multiplications figj here, and 30 additions. (This operation 
count does not include carries; we analyze carries below.) One can collect the 
multiplications by 2 -127 into 5 multiplications such as 2 -127 (/ 4 g 5 + / 5 g 4 ). We 
use another approach, precomputing 2 -127 /i, 2 _127 / 2 , 2 _127 / 3 , 2 _127 / 4 , 2 _127 / 5 , 
for two reasons: first, this reduces the latency of each hi computation, giving 
us more flexibility in scheduling; second, this gives us an opportunity to share 
precomputations when the input / is reused for another multiplication. 

3.3. Optimizing S (Field Squaring) and m (Constant Field Multipli- 
cation). For S, i.e., for / = g, we have 

ho — /o/o + e2/i/ 5 + e2/ 2 / 4 + e/3/3, hi — 2 / 0 /i + e 2/ 2 / 5 + e 2 / 3 / 4 , 

Z2 — 2/o/ 2 + fifi + e 2/ 3 / 5 + e/4/4, h 3 = 2/0/3 + 2/i/ 2 + e 2 / 4 / 5 , 

Ha — 2/0/4 T 2/1/3+ /2Z2 + e/5/5, h 5 = 2/0/5 + 2/1/4+ 2/2/3 

where e = 2 _m . We precompute 2/i, 2 / 2 , 2/ 3 , 2/ 4 , 2 / 5 and e/ 3 ,e/ 4 ,e/ 5 ; this 
costs 8 multiplications, where 5 of the multiplications can be freely replaced by 
additions. The rest of S, after this precomputation, takes 21 multiplications and 
15 additions, plus the cost of carries. 

For m we have simply ho — c/o, hi = c/i, etc., costing 6 multiplications plus 
the cost of carries. This does not work for arbitrary field constants, but it does 
work for the small constants stated in Section 2 . 6 . 

3.4. Carries. The output limbs hi from M are too large to be used in a 
subsequent multiplication. We carry ho -+ hi by rounding 2 ~ 22 ho to an integer 
Co, adding 2 22 co to Zi, and subtracting 2 22 co from ho. This takes 3 additions 
(the CPU has a rounding instruction, vroundpd, that costs just 1 addition) and 
2 multiplications. The resulting ho is guaranteed to be between — 2 21 and 2 21 . 

We could similarly carry hi -+ Z2 Z 3 -+ Z 4 -+ Z5, and carry Z5 -+ ho 
as follows: round 2 ~ 127 h 5 to an integer C5, add C5 to Zq 5 and subtract 2 127 cs 
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from h 5 . One final carry ho — >• h± 3 for a total of 7 carries (21 additions and 14 
multiplications), would then guarantee that all of ho, hi, /12, hs, h^, /15 are small 
enough to be input to a subsequent multiplication. 

The problem with this carry chain is that it has extremely high latency: 5 
cycles for 2 ~ 22 ho, 3 more cycles for Co, 5 more cycles for 2 22 co, and 3 more 
cycles to add to hi, all repeated 7 times, for a total of 112 cycles, plus the 
latency of obtaining ho in the first place. The ladder step in Figure 2.4 has a 
serial chain of if — ^ M — ^ m — ^ — >> S — >> M, for a total latency above 500 

cycles, i.e., above 125500 cycles for a 251-bit ladder. 

We do better in six ways. First, we use only 6 carries in M rather than 7, if the 
output will be used only for m. Even if the output ho is several bits larger than 
2 22 , it will not overflow the small-constant multiplication, since our constants 
are all bounded by 2 12 . 

Second, pushing the same idea further, we do these 6 carries in parallel. First 
we round in parallel to obtain Co, c\, C2, C3, C4, C5, then we subtract in parallel, 
then we add in parallel, allowing all of ho, hi, h^, /13, /14, ^5 to end up several bits 
larger than they would have been with full carries. 

Third, we also use 6 parallel carries for a multiplication that is an m. There 
is no need for a chain, since the initial ho, hi, h^, hs, h±, h$ cannot be very large. 

Fourth, we also use 6 parallel carries for each S. This allows the S output 
to be somewhat larger than the input, but this still does not create overflows 
in the subsequent M. At this point the only remaining block of 7 carries is in 
the M 4 by (1 : xi/yi : xi/zi : xi/ti), where M 4 means a vector of four field 
multiplications. 

Fifth, for that M 4 , we run two carry chains in parallel, carrying ho hi and 
hs — )> h/i, then hi — )► ft 2 and h^ h$, then h<± — » hs and ft 5 — » ho, then hs — » h± 
and ho hi. This costs 8 carries rather than 7 but chops latency in half. 

Finally, for that M 4 , we use the carry approach from [ 8 ]: add the constant 
<^22 = 2 22 ( 2 52 + 2 51 ) to ho, and subtract 0^22 from the result, obtaining the closest 
multiple of 2 22 to fto; add this multiple to hi and subtract it from ft 0 . This costs 
4 additions rather than 3, but reduces carry latency from 16 to 9, and also saves 
two multiplications. 

4 Permutations: Vectorizing the Hadamard Transform 

The Hadamard transform H in Section 2 is defined as follows: H(x,y,z,t) = 
(x + y + z + t,x + y — z — t,x — y + z — t,x — y — z + t). Evaluating this as written 
would use 12 field additions (counting subtraction as addition), but a standard 
“fast Hadamard transform” reduces the 12 to 8 . 

Our representation of field elements for the Sandy Bridge (see Section 3) 
requires 6 limb additions for each field addition. There is no need to carry before 
the subsequent multiplications; this is the main reason that we use 6 limbs rather 
than 5. 

In a ladder step there are 4 copies of H, each requiring 8 field additions, 
each requiring 6 limb additions, for a total of 192 limb additions. This operation 
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count suggests that 48 vector instructions suffice. Sandy Bridge has a helpful 
vaddsubpd instruction that computes (a — e, ft + /, c — g, d + ft) given (a, ft, c, d) 
and (e, /,#, ft), obviously useful inside H. 

However, we cannot simply vectorize across x,y,z,t. In Section 3 we were 
multiplying one x by another, at the same time multiplying one y by another, 
etc., with no permutations required; in this section we need to add x to y , and 
this requires permutations. 

The Sandy Bridge has a vector permutation unit acting in parallel with the 
adder and the multiplier, as noted in Section 3. But this does not mean that the 
cost of permutations can be ignored. A long sequence of permutations inside H 
will force the adder and the multiplier to remain idle, since only a small fraction 
of the work inside M can begin before H is complete. 

Our original software used 48 vector additions and 144 vector permutations 
for the 4 copies of H. We then tackled the challenge of minimizing the number 
of permutations. We ended up reducing this number from 144 to just 36. This 
section presents the details; analyzes conditional swaps, which end up consum- 
ing further time in the permutation unit; and concludes by analyzing the total 
number of operations used in our Sandy Bridge software. 

4.1. Limitations of the Sandy Bridge Permutations. There is a latency- 1 
permutation instruction vpermilpd that computes (y,x,t : z) given (x,y,z,t). 
vaddsubpd then produces (x — y, y + x, z — £, t + z), which for the moment we 
abbreviate as (e, f,g, h). At this point we seem to be halfway done: the desired 
output is simply (/ + ft, / — ft, e + g, e — g). 

If we had (/, ft, e, g ) at this point, rather than (e, /, g, ft), then we could apply 
vpermilpd and vaddsubpd again, obtaining (/ — ft, ft + /, e — g, g + e). One 
final vpermilpd would then produce the desired (/ + ft, / — ft, e + g, e — g). The 
remaining problem is the middle permutation of (e, /, g, ft) into (/, ft, e, g). 

Unfortunately, Sandy Bridge has very few options for moving data between 
the left half of a vector, in this case (e, /), and the right half of a vector, in this 
case (< 7 , ft). There is a vperm2fl28 instruction (1-cycle throughput but latency 
2 ) that produces (g, ft, e, /), but it cannot even produce (ft, g, /, e), never mind a 
combination such as (/, ft, e, g). (Haswell has more permutation instructions, but 
Ivy Bridge does not. This is not a surprising restriction: rz-bit vector units are 
often designed as n/ 2 -bit vector units operating on the left half of a vector in one 
cycle and the right half in the next cycle, but this means that any communication 
between left and right requires careful attention in the circuitry. A similar left- 
right separation is even more obvious for the Cortex- A 8 .) We could shift some 
permutation work to the load/store unit, but this would have very little benefit, 
since simulating a typical permutation requires quite a few loads and stores. 

The vpermilpd instruction (x, y, z, t ) (y, x , £, z) mentioned above is one of 
a family of 16 vpermilpd instructions that produce {x or y , x or y, z or £, z or t). 
There is an even more general family of 16 vshufpd instructions that pro- 
duce (a or ft, x or y, c or d,z or t) given (a, ft, c, d) and (x,y,z,t). In the first 
versions of our software we applied vshufpd to (e, /, ft) and (p, ft, e, /), 
obtaining (/, ft, g , e), and then applied vpermilpd to obtain (/, ft, e, g). 
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Overall a single H handled in this way uses, for each limb, 2 vaddsubpd 
instructions and 6 permutation instructions, half of which are handling the per- 
mutation of (e, /, g , ft) into (/, ft, e, g ). The total for all limbs is 12 additions and 
36 permutations, and the large “bubble” of permutations ends up forcing many 
idle cycles for the addition unit. This occurs four times in each ladder step. 

4.2. Changing the Input/Output Format. There are two obvious sources 
of inefficiency in the computation described above. First, we need a final permu- 
tation to convert (/ — ft, f + ft, e — g, e + g) into (/ + ft, f — ft, e + g, e — g ) . Second, 
the middle permutation of (e, /, <7, ft) into (/, ft, e,g) costs three permutation 
instructions, whereas (g, ft, e, /) would cost only one. 

The first problem arises from a tension between Intel’s vaddsubpd, which al- 
ways subtracts in the first position, and the definition of H , which always adds in 
the first position. A simple way to resolve this tension is to store (t,z,y,x) instead 
of (x, y , 2, t) for the input, and (£', z', y' , x') instead of (V, y ', z' , t') for the output; 
the final permutation then naturally disappears. It is easy to adjust the other 
permutations accordingly, along with constants such as (1, a 2 /6 2 , a 2 /c 2 , a 2 /d 2 ). 

However, this does nothing to address the second problem. Different per- 
mutations of (x : y,z,t) as input and output end up requiring different middle 
permutations, but these middle permutations are never exactly the left-right 
swap provided by vperm2f 128. 

We do better by generalizing the input/output format to allow negations. 
For example, if we start with (x, — y, z, £), permute into (—y,x,t,z), and apply 
vaddsubpd, we obtain (x-\-y,x — y,z — t,t + z). Observe that this is not the same 
as the (x — y,x + y, z — t,t + z) that we obtained earlier: the first two entries 
have been exchanged. 

It turns out to be best to negate z, i.e., to start from (x,y,—z,t). Then 
vpermilpd gives (y, x, £, —z), and vaddsubpd gives (x y, x + y, —z — £, £ — z\ 
which we now abbreviate as (e, /, g,h). Next vperm2fl28 gives (p, ft, e, /), and 
independently vpermilpd gives (/, e, h,g). Finally, vaddsubpd gives (/ — g, ft + 
e, ft — e, / + g). This is exactly (V, £', —z', y') where (V, y ' , z* , t') = H(x , y , 2:, t). 

The output format here is not the same as the input format: the positions of 
t and y have been exchanged. Fortunately, Figure 2.4 is partitioned by the H 
rows into two separate universes, and there is no need for the universes to use 
the same format. We use the (x, y, —z, t ) format at the top and bottom, and the 
(x,£, —z,y) format between the two H rows. It is easy to see that exactly the 
same sequence of instructions works for all the copies of #, either producing 
(x, y , — z, t) format from (x, £, —2:, y) format or vice versa. 

S 4 and M 4 do not preserve negations: in effect, they switch from (x,t,—z,y) 
format to (x,t, z,y) format. This is not a big problem, since we can reinsert 
the negation at any moment using a single multiplication or low-latency logic 
instruction (floating-point numbers use a sign bit rather than twos-complement, 
so negation is simply xor with a 1 in the sign bit). Even better, in Figure 2.4(b), 
the problem disappears entirely: each S 4 and M 4 is followed immediately by a 
constant multiplication, and so we simply negate the appropriate constants. The 
resulting sequence of formats is summarized in Figure 4.3. 
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Fig. 4.3. Output format that we use for each operation in the right side of Figure 2.4 
on Sandy Bridge, including permutations and negations to accelerate H 


Each H now costs 12 additions and just 18 permutations. The number of 
non-addition cycles that need to be overlapped with operations before and after 
H has dropped from the original 24 to just 6. 

4.4. Exploiting Double Precision. We gain a further factor of 2 by temporar- 
ily converting from radix 2 127 / 6 to radix 2 127 / 3 during the computation of H. This 
means that, just before starting H , we replace the six limbs (/iq, ft-i, h 2l h^ h§) 
representing ho + hi + h 2 + ^3 + ^4 + ^5 by three limbs (ho + / 11 , h 2 + ^ 3 , ^4 + h$). 
These three sums, and the intermediate H results, still fit into double-precision 
floating-point numbers. 

It is essential to switch each output integer back to radix 2 127 / 6 so that each 
output limb is small enough for the subsequent multiplication. Converting three 
limbs into six is slightly less expensive than three carries; in fact, converting from 
six to three and back to six uses exactly the same operations as three carries, 
although in a different order. 

We further reduce the conversion cost by the following observation. Except 
for the M 4 by (1 : X\jy\ : x\j Z\ : aq/ti), each of our multiplication results uses 
six carries, as explained in Section 3.4. However, if we are about to add ho to h\ 
for input to H , then there is no reason to carry ho h\, so we simply skip that 
carry; we similarly skip h 2 h% and h 4 h$. These skipped carries exactly 
cancel the conversion cost. 

For the M 4 by (1 : x\/y\ : X\jz\ : x\/t\) the analysis is different: ho is large 
enough to affect h 2j and if we skipped carrying ho hi h 2 then the output 
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of H would no longer be safe as input to a subsequent multiplication. We thus 
carry ho — >• h ±, —> /13, and /14 in parallel; and then hi — >• /12, h % /14, 

and /15 — >> ho in parallel. In effect this M 4 uses 9 carries, counting the cost of 
conversion, whereas in Section 3.4 it used only 8. 

To summarize, all of these conversions for all four H cost just one extra 
carry, while reducing 48 additions and 72 permutations to 24 additions and 36 
permutations. 

4.5. Conditional Swaps. A ladder step starts from an input (X(nP),X((n + 
1)P)), which we abbreviate as L(n), and produces L(2n) as output. Swapping 
the two halves of the input, applying the same ladder step, and swapping the 
two halves of the output produces L(2n + 1) instead; one way to see this is to 
observe that L(—n — 1) is exactly the swap of L(n). 

Consequently one can reach L{2n + e) for e G {0, 1} by starting from L(n), 
conditionally swapping, applying the ladder step, and conditionally swapping 
again, where the condition bit is exactly e. A standard ladder reaches L(n) by 
applying this idea recursively. A standard constant-time ladder reaches L(n) by 
applying this idea for exactly i steps, starting from L(0), where n is known 
in advance to be between 0 and 2^ — 1. An alternate approach is to first add 
to n an appropriate multiple of the order of P, producing an integer known 
to be between (e.g.) 2^ +1 and 2^ +2 — 1, and then start from L( 1). We use a 
standard optimization, merging the conditional swap after a ladder step into 
the conditional swap before the next ladder step, so that there are just £ + 1 
conditional swaps rather than 2£. 

One way to conditionally swap field elements x and x' using floating-point 
arithmetic is to replace (x, x') with (x + b(x' — x\x’ — b(x' — x)) where b is the 
condition bit, either 0 or 1. This takes three additions and one multiplication 
(times 6 limbs, times 4 field elements to swap). It is better to use logic instruc- 
tions: replace each addition with xor, replace each multiplication with and, and 
replace b with an all-1 or all-0 mask computed from b. On the Sandy Bridge, 
logic instructions have low latency and are handled by the permutation unit, 
which is much less of a bottleneck for us than the addition unit. 

We further improve the performance of the conditional swap as follows. The 
M 4 on the right side of Figure 4.3 is multiplying H of the left input by H of 
the right input. This is commutative: it does not depend on whether the inputs 
are swapped. We therefore put the conditional swap after the first row of H 
computations, and multiply the H outputs directly, rather than multiplying the 
swap outputs. This trick has several minor effects and one important effect. 

A minor advantage is that this trick removes all use of the right half of the 
swap output; i.e., it replaces the conditional swap with a conditional move. This 
reduces the original 24 logic instructions to just 18. 

Another minor advantage is as follows. The Sandy Bridge has a vectorized 
conditional-select instruction vblendvpd. This instruction occupies the permu- 
tation unit for 2 cycles, so it is no better than the 4 traditional logic instructions 
for a conditional swap: a conditional swap requires two conditional selects. How- 
ever, this instruction is better than the 3 traditional logic instructions for a 
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conditional move: a conditional move requires only one conditional select. This 
replaces the original logic instructions with 6 conditional-select instructions, con- 
suming just 12 cycles. 

A minor disadvantage is that the first M 4 and S 4 are no longer able to 
share precomputations of multiplications by 2 -127 . This costs us 3 multiplication 
instructions. 

The important effect is that this trick reduces latency, allowing the M 4 to 
start much sooner. Adding this trick immediately produced a 5% reduction in 
our cycle counts. 

4.6. Total Operations. We treat Figure 2.4(b) as 2M 4 + 3S 4 + 3m 4 + 4 H. 

The main computations of /q, not counting precomputations and carries, cost 
30 additions and 36 multiplications for each M 4 , 15 additions and 21 multiplica- 
tions for each S 4 , and 0 additions and 6 multiplications for each m 4 . The total 
here is 105 additions and 153 multiplications. 

The M 4 by (1 : x\/y\ : X\jz\ : x\jt\) allows precomputations outside the 
loop. The other M 4 consumes 5 multiplications for precomputations, and each S 4 
consumes 8 multiplications for precomputations; the total here is 29 multiplica- 
tions. We had originally saved a few multiplications by sharing precomputations 
between the first S 4 and the first M 4 , but this is incompatible with the more 
important trick described in Section 4.5. 

There are a total of 24 additions in the four i7, as explained in Section 4.4. 
There are also 51 carries (counting the conversions described in Section 4.4 as 
carries), each consuming 3 additions and 2 multiplications, for a total of 153 
additions and 102 multiplications. 

The grand total is 282 additions and 284 multiplications, evidently requiring 
at least 284 cycles for each iteration of the main loop. Recall that there are 
various options to trade multiplications for additions: each S 4 has 5 precomputed 
doublings that can each be converted from 1 multiplication to 1 addition, and 
each carry can be converted from 3 additions and 2 multiplications to 4 additions 
and 0 multiplications (or 4 additions and 1 multiplication for h$ ho). We could 
use either of these options to eliminate one multiplication, reducing the 284-cycle 
lower bound to 283 cycles, but to reduce latency we ended up instead using the 
first option to eliminate 10 multiplications and the second option to eliminate 35 
multiplications, obtaining a final total of 310 additions and 239 multiplications. 
These totals have been computer- verified. 

We wrote functions in assembly for M 4 , S 4 , etc., but were still over 500 cy- 
cles. Given the Sandy Bridge floating-point latencies, and the requirement to 
keep two floating-point units constantly busy, we were already expecting in- 
struction scheduling to be much more of an issue for this software than for 
typical integer-arithmetic software. We used various standard optimization tech- 
niques that were already used in several previous DH speed records: we merged 
the functions into a single loop, reorganized many computations to save regis- 
ters, and eliminated many loads and stores. After building a new Sandy Bridge 
simulator and experimenting with different instruction schedules we ended up 
with our current loop, just 338 cycles, and a total of 88916 Sandy Bridge cycles 
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for scalar multiplication. The main loop explains 84838 of these cycles; the re- 
maining cycles are spent outside the ladder, mostly on converting (x : y : z : t) 
to {pc/y : x/z : x/t) for output. 

5 Cortex-A8 

The low-power ARM Cortex- A8 core is the CPU core in the iPad 1, iPhone 4, 
Samsung Galaxy S, Motorola Droid X, Amazon Kindle 4, etc. Today a Cortex- 
A8 CPU, the Allwinner A10, costs just $5 in bulk and is widely used in low-cost 
tablets, set-top boxes, etc. Like Sandy Bridge, Cortex- A8 is not the most recent 
microarchitecture, but its very wide deployment and use make it a sensible choice 
of platform for optimization and performance comparisons. 

Bernstein and Schwabe in [15] (CHES 2012) analyzed the vector capabilities 
of the Cortex- A8 for various cryptographic primitives, and in particular set a 
new speed record for high-security DH, namely 460200 Cortex- A8 cycles. We do 
much better, just 274593 Cortex-A8 cycles, measured on a Freescale i.MX515. 
Our basic vectorization approach is the same for Cortex- A8 as for Sandy Bridge, 
and many techniques are reused, but there are also many differences. The rest 
of this section explains the details. 

5.1. Cortex- A8 Vector Units. Each Cortex- A8 core has two 128-bit vector 
units operating in parallel on vectors of four 32-bit integers or two 64-bit integers: 

• The arithmetic port takes one cycle for vector addition, with latency 2; or 
two cycles for vector multiplication (two 64-bit products ac, bd given 32-bit 
inputs a, b and c, d), with latency 7. Logic operations also use the arithmetic 
port. 

• The load/store port handles loads, stores, and permutations. ARM’s Cortex- 
A8 documentation [5] indicates that the load/store port can carry out one 
128-bit load every cycle. Beware, however, that there are throughput lim- 
its on the LI cache. We have found experimentally that the common TI 
Sitara Cortex- A8 CPU (used, e.g., in the Beaglebone Black development 
board) needs three cycles from one load until the next (this is what we 
call “Cortex-A8-slow”), while other Cortex- A8 CPUs ( “Cortex- A8- fast” ) can 
handle seven consecutive cycles of loads without penalty. 

There are three obvious reasons for Cortex- A8 cycle counts to be much larger 
than Sandy Bridge cycle counts: registers are only 128 bits, not 256 bits; there are 
only 2 ports, not 6; and multiplication throughput is 1 every 2 cycles, not 1 every 
cycle. However, there are also speedups on Cortex-A8. There is (as in Haswell’s 
floating-point units — see Appendix B online) a vector multiply-accumulate in- 
struction with the same throughput as vector multiplication. A sequence of m 
consecutive multiply-accumulate instructions that all accumulate into the same 
register executes in 2 m cycles (unlike Haswell), effectively reducing multiplica- 
tion latency from 7 to 1. Furthermore, Cortex- A8 multiplication produces 64-bit 
integer products, while Sandy Bridge gives only 53-bit-mantissa products. 
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5.2. Representation. We decompose an integer / modulo 2 127 — 1 into five 
integer pieces in radix 2 127 / 5 : i.e., we write / as /o + 2 26 / i +2 51 /2 + 2 77 /3 + 2 102 / 4 . 
Compared to Sandy Bridge, having 20% more room in 64-bit integers than in 
53-bit floating-point mantissas allows us to reduce the number of limbs from 6 
to 5. We require the small integers /o, /i, /2, /3, / 4 to be unsigned because this 
reduces carry cost from 4 integer instructions to 3. 

We arrange four integers x, y , z, t modulo 2 127 — 1 in five 128-bit vectors: 
Oo,2/o,24,2/i); O2, 2/2, 2:3, 2/3); (2:4, 2/4, 24, t 4 ); Oo,to,£i,ti); (z 2 ,t 2 ,z 3 ,t 3 ). This 
representation is designed to minimize permutations in M, S, and H. For exam- 
ple, computing ( x 3 + zo, 2/0 + to, x\ + zi, 2/1 + £1) takes just one addition without 
any permutations. The Cortex- A8 multiplications take two pairs of inputs at a 
time, rather than four as on Sandy Bridge, so there is little motivation to put 
(xo, 2/o, zo, to) into a vector. 

5.3. Optimizing M. Given an integer / as above and an integer g = go + 
2 26 gi + 2 51 </2 + 2 77 2/ 3 + 2 102 g 4 , the product fg modulo 2 127 — 1 is h = ho + 
2 2 % + 2 51 h 2 + 2 77 h 3 + 2 102 h 4 , with 

ho = fogo + 2/1^4 + 2/2^3 + 2/3^2 + 2/4^1, 
hi = fogi + figo + /2^4 + 2/3(73 + / 4 ^2, 
h-2 = fog2 + 2/1^1 + f 2 go + 2/3^4 + 2/4^3, 

^3 = fog 3 + /l ^2 + / 2 < 7 l + fsgo + / 4 ^ 4 , 

^4 — fog 4 + 2/1 <73 + /2^2 + 2/3^1 + f 4 go - 

There are 25 multiplications figj] additions are free as part of multiply- 
accumulate instructions. We precompute 2/i, 2/2, 2/3, 2/ 4 so that these values 
can be reused for another multiplication. These precomputations can be done 
by using either 4 shift or 4 addition instructions. Both shift and addition use 1 
cycle per instruction, but addition has a lower latency. See Section 5.6 for the 
cost of carries. 

5.4. Optimizing S. The idea of optimizing S in Cortex- A8 is quite similar to 
Sandy Bridge; for details see Section 3.3. We state here only the operation count. 
Besides precomputation and carry, we use 15 multiplication instructions; some 
of those are actually multiply- accumulate instructions. 

5.5. Optimizing m. For m we compute only ho = c/o, hi = c/i, h 2 = cf 2 , 
h 3 = cf 3 , and h 4 = c/ 4 , again exploiting the small constants stated in Section 2.6. 

Recall that we use unsigned representation. We always multiply absolute 
values, then negate results as necessary by subtracting from 2 129 — 4: no = 
2 28 -4-h 0 , m = 2 27 -4-hi, n 2 = 2 28 — 4 — h 2 , n 3 = 2 27 -4-h 3 , n 4 = 2 27 -4-h 4 . 

Negating any subsequence of x, 2/, z, t costs at most 5 vector subtractions. 
Negating only x or 2/, or both x and 2/, costs only 3 subtractions, because our 
representation keeps x, y within 3 vectors. The same comment applies to z and 
t. The specific m in Section 2.6 end up requiring a total of 13 subtractions with 
the same cost as 13 additions. 
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5.6. Carries. Each multiplication uses at worst 6 serial carries h\ hs —>> 

ft -4 ho —> hi, each costing 3 additions. Various carries are eliminated by the 
ideas of Section 3.4. 

5.7. Hadamard Transform. See Appendix A online. 

5.8. Total Arithmetic. We view Figure 2.4(b) as 4M 2 + 6S 2 + 6m 2 + AH. 
Here we combine x multiplications and y multiplications into a vectorized M 2 , 
and similarly combine z multiplications and t multiplications; this fits well with 
the Cortex- A8 vector multiplication instruction, which outputs two products. 

The main computations of hi, not counting precomputations and carries, cost 
0 additions and 25 multiplications for each M, 0 additions and 15 multiplications 
for each S, 0 additions and 5 multiplications for each m, and 15 additions for 
each H block. The total here is 60 additions and 220 multiplications. 

Each M costs 4 additions for precomputations, and each S also costs 4 ad- 
ditions for precomputations. Some precomputations can be reused. The cost of 
precomputations is 20 additions. 

There are 10 carry blocks using 6 carries each, and 6 carry blocks using 5 
carries each. Each carry consists of 1 shift, 1 addition, and 1 logical and. This 
cost is equivalent to 3 additions. There are another 13 additions needed to handle 
negation. Overall the carries cost 283 additions. Two conditional swaps, each 
costing 9 additions, sum up to 18 additions. 

In total we have 381 additions and 220 multiplications in our inner loop. This 
means that the inner loop takes at least 821 cycles. 

We scheduled instructions carefully but ended up with some overhead beyond 
arithmetic: even though the arithmetic and the load/store unit can operate in 
parallel, latencies and the limited number of registers leave the arithmetic unit 
idle for some cycles. Sobole’s simulator at [48], which we found very helpful, 
reports 966 cycles. Actual measurements report 986 cycles; the 251 ladder steps 
thus account for 247486 of our 273349 cycles. 
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Abstract. This paper presents a new projective coordinate system and 
new explicit algorithms which together boost the speed of arithmetic 
in the divisor class group of genus 2 curves. The proposed formu- 
las generalise the use of Jacobian coordinates on elliptic curves, and 
their application improves the speed of performing cryptographic scalar 
multiplications in Jacobians of genus 2 curves over prime fields by an 
approximate factor of 1.25x. For example, on a single core of an Intel Core 
i7-3770M (Ivy Bridge), we show that replacing the previous best formulas 
with our new set improves the cost of generic scalar multiplications from 
243,000 to 195,000 cycles, and drops the cost of specialised GLV-style 
scalar multiplications from 166,000 to 129,000 cycles. 

Keywords: Genus 2, hyperelliptic curves, explicit formulas, Jacobian 
coordinates, scalar multiplication. 


1 Introduction 

Motivated by the popularity of low-genus curves in cryptography [29122123] , we 
put forward a new system of projective coordinates that facilitates efficient group 
law computations in the Jacobians of hyperelliptic curves of genus 2. This paper 
combines several techniques to arrive at explicit formulas that are significantly 
faster than those in previous works nag. The two main ingredients we use in 
the derivation are: 

— The generalisation of Jacobian coordinates from the elliptic curve setting 
to the hyperelliptic curve setting: these coordinates essentially cast affine 
points into projective space according to the weights of x and y in the 
defining curve equation. While applying Jacobian coordinates to elliptic 
curves is straightforward, their application to hyperelliptic curves requires 
transferring the x-y weightings into weightings for the Mumford coordinates. 
As it does for the x-y coordinates in genus 1, this projection naturally 
balances the Mumford coordinates to facilitate substantial simplifications 
in the projective genus 2 group law formulas. 

— The adaptation of Meloni’s “co-Z” idea [28. to the genus 2 setting. Although 
originally proposed in the context of addition-only (e.g. Fibonacci- style) 
chains, this approach can also be used to gain performance in the more 

P. Sarkar and T. Iwata (Eds.): ASIACRYPT 2014, PART I, LNCS 8873, pp. 338 J357] 2014. 
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meaningful context of binary addition chains. Moreover, this idea is espe- 
cially advantageous when used in conjunction with Jacobian coordinates. 

The application of the above techniques, as well as some further optimisations 
discussed in the body of this paper, gives rise to the operation counts in TableQ]- 
the counts here include field multiplications (M), squarings (S), and multiplica- 
tions by curve constants (D). Here we make a brief comparison with the previous 
works in 25 and f9j, by considering the two most common operations in the 
context of cryptographic scalar multiplications: a point doubling (denoted DBL), 
and a mixed-doubling-and-addition (denoted mDBLADD) between two pointqj. 
These two operations constitute the bottleneck of most state-of-the-art scalar 
multiplication routines, since the multiplication of a point in the Jacobian by 
an n-bit scalar typically requires a DBL operations and (3 mDBLADD operations, 
where a + /3 « n. Thus, the improved operation counts in Table [T| give a rough 
idea of the speedups that we can expect when plugging these formulas into an 
existing genus 2 scalar multiplication routine that uses the formulas from [[25} 
or [9]. (We give a better indication of the improvements over previous formulas 
by reporting concrete implementation numbers in Section [8}) As well as the 
reduction in field multiplications indicated in Table [lj the explicit formulas in 
this paper also require far fewer field additions than those in [23] and [9 . We 
note that the biggest relative difference occurs in the mDBLADD column: among 
other things, this difference results from the combination of the new coordinate 
system with the extension of Meloni’s idea [28] . which allows us to compute 
mDBLADD operations independently of the curve constants. On the other hand, 
when such curve constants are zero, certain operations in this paper become even 
faster (relatively speaking): for example, on the two special families exhibiting 
endomorphisms used in [7], the doubling formulas in [25} and [9] save 2D, while 
the new operation count reported for DBL in Table [T] saves 3S + 2D to drop down 
to 21M T 9S. 


Table 1 . Field operation counts obtained in this work, versus two previous works, for 
the most common operations incurred during cryptographic scalar multiplications in 
Jacobians of genus 2 curves of the form C/K : y 2 = f(x), where f(x) is of degree 5 
and the characteristic of K is greater than 5 


authors 

DBL 

mADD 

mDBLADD 

Lange [25] 

Costello- Lauter [9] 

32M + 7S + 2D 

30M + 9S + 2D 

36M + 5S 

36M + 5S 

68M + 12S + 2D 

66M + 14S + 2D 

This work 

21M + 12S + 2D 

29M + 7S 

52M + 11S 


1 For genus 2 scalar multiplications, it is usually advantageous to convert precomputed 
(lookup table) points to their affine representation using a shared inversion - see 
Section EU This is why the double-and-add operations involve a “mixed” addition. 
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While the formulas in this paper target Jacobians of imaginary genus 2 curves, 
Gaudry showed in m that one can perform cryptographic scalar multiplications 
much more efficiently in the special case that the Jacobian of the curve C/K has 
K- rational two-torsion, by instead working on an associated Kummer surface. To 
illustrate the difference between working on the Kummer surface and working in 
the full Jacobian group, Gaudry’s analogous operation counts are a blazingly fast 
6M + 8S for DBL and 16M + 9S for mDBLADD. Referring back to TableHJ it is clear 
that raw scalar multiplications on the Kummer surface will remain unrivalled by 
those in the full Jacobian group. However, there are several cryptographic caveats 
related to the Kummer surface that justify the continued exploration of fast 
algorithms for traditional arithmetic in the Jacobian. Namely, Kummer surfaces 
do not support generic additions, so while they are extremely fast in the realm 
of key exchange (where such additions are not necessary), it is not yet known 
how to efficiently use the Kummer surface in a wider realm of cryptographic 
settings, e.g. for general digital signature^. Furthermore, the absence of generic 
additions complicates the application of endomorphisms [7, §8.5], and from a 
more pragmatic standpoint, also prevents the use of standard precomputation 
techniques that exploit fixed system parameters (those of which give huge 
speedups in practice, even over the Kummer surface [7, §7.4]). Thus, all genus 
2 implementations that either target signature schemes, use endomorphisms, or 
optimise the use of precomputation, are currently required to work in the full 
Jacobian group@; and in all of these cases, the formulas in this paper will now offer 
the most efficient route. The upshot is that in popular practical scenarios the 
most efficient genus 2 cryptography is likely to result from a hybrid combination 
of operations on the Kummer surface and in the full Jacobian group. We illustrate 
this in Section [8] by benchmarking genus 2 curves in the context of ephemeral 
elliptic curve Diffie-Hellman (ECDHE) with perfect forward secrecy: to exploit 
the best of both worlds, Alice’s multiplications of the public generator P by each 
one of her ephemeral scalars a can make use of our new explicit formulas (and 
offline precomputations on P ) in the full Jacobian, and her resulting ephemeral 
public keys [a]P can then be mapped onto the corresponding Kummer surface, 
whose speed can be exploited by Bob in the computation of the shared secret 
[b]([a\P). 

A set of Magma [8] scripts verifying all of the explicit formulas and operation 
counts can be found in the full version [21] , and is also publicly available at 

http: //research.microsoft . com/en-us/downloads/37730278-3e37-47eb-91dl-cf 889373677a/ ; 

and a complete mixed-assembly-and-C implementation of all explicit formulas 
and scalar multiplication routines is publicly available at 

http : //hhisil . yasar . edu . tr/f iles/hisil20140527 j acobian . tar . gz 

2 At least one exception here, as Gaudry points out, is the hashed version of ElGamal 
signatures 46] §5.3]. 

3 Lubicz and Robert m have recently broken through the “full addition restriction” 
on Kummer varieties, but it is not yet clear how competitive their compatible addition 
formula are in the context of raw scalar multiplications. 
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2 Preliminaries 


For ease of exposition, we immediately restrict to the most cryptographically 
common case of genus 2 curves, where C is an imaginary hyperelliptic curve over 
a field if of characteristic greater than 5. (In terms of a general coverage of all 
genus 2 curves, we mention the interesting remaining scenarios in Section El) 
Every such curve can then be written as 

C/K : y 2 = f(x) := x 5 + a 3 x 3 + a 2 x 2 + a\x + a 0 , (1) 

where we note the absence of an x 4 term in f(x); it can always be removed via 
a trivial substitution thanks to char (if) ^ 5. 

Let Jc denote the Jacobian of C. We assume that we are working with a 
general point P G Jc(if), whose Mumford representation 

P GG (u(x),v(x)) = (x 2 + qx + r, sx + t) G K[x\ x K[x] 

^ (q,r,s,t) € A 4 (K) 


encodes two affine points (#1,2/1), (#2? 2/2) G C(if), where we assume that x\ 7^ 
x 2 so that these two points are not the same, nor are they the hyperelliptic 
involution of one another. The Mumford coordinates (g, r, s, t) of P are uniquely 
determined according to u(x\) = u{x 2 ) = 0, v(x\) = yi and v(x 2 ) = y 2 . That is, 


q = —(xi+x 2 ), r = x ix 2 , s = 


y 1 - 2/2 


£ = 


xi - x 2 

From (jXj ), ([2]) and ©, it is readily seen that 

v(x) 2 — f(x) = 0 in if [x]/(^(x)), 


XlV 2 ~ V\x 2 
xi - x 2 


( 3 ) 

( 4 ) 


from which it follows that such general points P lie in the intersection of two 
hyper surfaces over if [9] §3], given as 

50 : r(s 2 + q 3 - (2 r - a 3 )q - a 2 ) = t 2 - a 0 , 

51 : q (s 2 + q 3 — (3r — a 3 )q — a 2 ) = 2st — r(r — a 3 ) — a\. 

We note that a more simple relation is found by taking rS\ — qSo. 

Our driving motivation for improving the explicit formulas for arithmetic 
in the Jacobian is the application of enhancing the fundamental operation in 
curve-based cryptosystems: the scalar multiplication [k\P of an integer fc G Z 
by a general point P in Jc . Such scalar multiplications are computed using 
a sequence of point doubling and addition operations, and so a common way 
of comparing different sets of addition formulas is to tally the number of field 
multiplications (M), field squarings (S), and field additions (denoted by a) that 
each point operation incurs. In cryptographic contexts, the input and output 


4 We adopted the notation (g, r, s,£) over (ui,uo,vi,vo) to avoid additional sub- 

scripts/superscripts when working with distinct elements in Jc . 


342 


H. Hisil and C. Costello 


points are typically required to be in their unique affine form, whilst intermediate 
computations are carried out in projective space to avoid inversions. Thus, 
the most commonly reported operation counts include: DBL, which refers to the 
addition of a Jacobian point in projective form to itself; ADD, which refers to the 
addition between two distinct points in projective form; mADD, which refers to 
the mixed addition between a projective point and an affine point; and mDBLADD, 
which refers to the combined doubling of a projective point and subsequent 
addition of the result with an affine point. 

As is done in |25] §5-6], in this paper we focus on deriving formulas for the most 
common cases of arithmetic in Jc . This set of formulas is enough to perform and 
benchmark scalar multiplications in Jc , since the possible input / output cases are 
extremely dense amongst all possible scenarios, i.e. for random input points P 
and scalars k , the cases not covered by these formulas have an exponentially small 
probability of being encountered in the scalar multiplication routine (see [23 
§1.2] for a similar discussion). Nevertheless, the set of formulas we present are still 
far from a complete and cryptographically adequate coverage, so it is important 
to distinguish exactly which input /output cases they do apply to. We clarify this 
in Assumption [T| below, and return to this discussion in 97.31 

Assumption 1 (General Points and Operations in Jc*)* Throughout this 
paper, we assume that all input and output points are “general” points in Jc : 
we say that P G Jc is general if the Mumford representation of P encodes two 
distinct affine points (xi,yi) and (#2? 2/2) on C, where x\ 7^ X 2 . Moreover, all 
operations in this paper are of the form Pi + P2 = P3, where we assume that 
Pi, P 2 and P3 are general points and that we are in one of two cases: (i) either 
Pi = P 2 , in which case we are computing the “doubling” P3 = [2] Pi, where we 
further assume that neither of the two x- coordinates encoded by Pi coincide with 
the two encoded by P3, or (ii) that of the six points encoded by Pi, P 2 and P 3 , 
no two share the same x-coordinate. 

3 Extending Jacobian Coordinates to Jacobians 

Let A be a nonzero element in K. Over fields of large characteristic, Jacobian 
coordinates have proven to be a natural and efficient way to work projectively 
on elliptic curves in short Weierstrass form £/K : y 2 = x 3 + ax + b. Indeed, 
in cryptographic contexts, using the triple (A 2 X : X 3 Y : A Z) E P(2,3, 1 )(K) to 
represent the affine point (X/Z 2 ,Y/Z 3 ) E A 2 (K) on £ was suggested by Miller 
in his seminal 1985 paper [23 p. 424], and his comment that this representation 
“appears best” still holds true after decades of further exploration: Jacobian 
coordinates (and extended variants) remain the most efficient way to work on 
such general Weierstrass curves |4 . Moreover, the weightings wt(x) = 2 and 
wt (y) = 3 are the orders of the poles of the functions x and y at the point at 
infinity on £. 

In the context of imaginary hyperelliptic curves of the form 
C/K : y 2 = x 3 + a%x 3 + a 2 X 2 + a\x + <20, 
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the analogous weightings are 

wt(x) = 2, and wt (y) = 5, (6) 

under which the affine point (X/Z 2 , F/Z 5 ) G A 2 (K) is represented by the triple 
(A 2 X : A 5 F : A Z) G P(2, 5, 1 )(X), which lies on 

C/K : F 2 = X 5 + a 3 X 3 Z 4 + a 2 X 2 Z 6 + aiXZ 8 + a 0 Z 10 . (7) 

Indeed, the weights wt(x) = 2 and wt(y) = 5 are the orders of the poles of x and 
y at the (unique) point at infinity on C. Since we perform arithmetic using the 
Mumford coordinates in Jq, rather than the x-y coordinates on C, we transfer 
the above weightings across to the Mumford coordinates via Equation m, which 
yields 

wt (q) = wt(x), wt (r) = 2 • wt(x), wt(s) = wt (y) — wt(x), wt(t) = wt (y). 

(8) 

Combining ([6]) and (j8j) then gives 

wt(q) = 2, wt(r) = 4, wt(s) = 3, wt(t) = 5, (9) 

which suggests the use of (A 2 Q : A 4 R : A 3 S' : A 5 T : A Z) G P(2,4,3,5,l )(K) to 
represent the affine point 

(q,r,s,t) = e A 4 (K). (10) 

Equation m is at the heart of this paper. We found these weightings to 
be highly advantageous for group law computations: the Mumford coordinates 
balance naturally under this projection, and significant simplifications occur 
regularly in the derivation of the corresponding explicit formulas. This coordinate 
system is referred to as Jacobian coordinates in this paper. We note that, in line 
with Assumption [lj we will not work with the full projective closure of the affine 
part in P(2, 4, 3, 5, 1 )(K), but rather with the affine patch where Z / 0. 

Just as in [25^ §6], we found it useful to introduce an additional coordinate 
(independent of Z) in the denominator of the two coordinates corresponding to 
the ^-polynomial in the Mumford representation. So, in addition to the Jacobian 
coordinate Z, we include the coordinate W and use the projective six-tuple 
(A 2 Q : A 4 R : A 3 /aS : A 5 /iT : AZ : fiW) to represent the affine point 

(q,r,s,t) = (7^, e A (K") ( 11 ) 

for some nonzero /i in if. This coordinate system is referred to as auxiliary 
Jacobian coordinates in this paper. 

Remark 1. We note the distinction between the above coordinate weightings 
and the weightings used by Lange, which were also said to “generalise the 
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concept of Jacobian coordinates . . . from elliptic to hyperelliptic curves” [251, 
§6]. In terms of the first projective coordinate Z, Lange used (g, r, s,£) = 
(Q/Z 2 ,P/Z 2 ,P/Z 3 ,T/Z 3 ). Although these weight the u- and t’-polynomials of 
a point with the same (Jacobian) weightings as the x- and //-coordinates on an 
elliptic curve, the derivation of the weightings in (flQl) draws a closer analogy 
with the use of Jacobian coordinates in genus 1. This is why we dubbed the 
weightings used in this work as “Jacobian coordinates”. 

4 Adopting the “co-Z” Approach 

With the aim of improving addition formulas on elliptic curves, Meloni [28] put 
forward a nice idea that is particularly suited to working in Jacobian coordinates. 
In the explicit addition of two elliptic curve points (Ai :Y\ : Z\) and (A 2 : T 2 : 
Z2) in P(2, 3, 1 )(K), which respectively correspond to the points (X1/Z 2 , Yi/Zf) 
and (A 2 /Z 2 , Y 2 /Z 2 ) in A 2 (A), Meloni observed that almost all expressions of 
the form Z\Z 3 2 can completely vanish if Z\ == Z 2 . That is, the sum of the points 
(Xi : Y\ : Zi) and (A 2 : Y2 : Z\) can be written as an expression of the 
form (A3 Zf : Y^Zf : Z^Zf), which is projectively equivalent to (A3 : I3 : Z3); 
here A3 and Y3 depend only on Ai, Yi, X2 and Y2, so now it is only Z3 that 
depends on Z\. Since two projective points are unlikely to share the same Z- 
coordinate in general, the method starts by updating one or both of the input 
points to force this equivalence. The obvious way to do this is to respectively 
cross-multiply (Ai :Y\ : Z\) and (A2 : Y2 : Z 2 ) into (Ai Z\ : Y\Z\ : Z\Z<2) and 
(A2 Z 2 : Y 2 Z\ : Z2Z1), but as it stands, performing this update would incur a 
significant overhead. The observation that is key to making this “co-Z” approach 
advantageous is that, in the context of scalar multiplications, these updated 
values (or the main subexpressions within them) are often already computed in 
the previous operation [281, p. 192], so this update can be performed either for 
free, or with a much smaller overhead. 

Meloni did not apply his idea to classical “double-and-add” style addition 
chains, but subsequent papers [26119] showed how his approach could be used 
to enhance performance in such binary chains. In genus 2 however, successful 
transferral of the “co-Z” idea has not yet been achieved: the work in m also 
uses non-binary addition chains, and crucially, it was performed without access to 
the hyperelliptic analogue of Jacobian coordinates (those which work in stronger 
synergy with Meloni’s idea). 

Our adaptation of the “co-Z” approach requires that both the Z and W 
coordinates are the same, for two different input points. The first projective 
formulas we derive in Section [6] are for the “co-ZW” addition between the two 
points Pi = (Qi iRuSuTuZn W ± ) and P 2 = (Q 2 : P 2 : S2 : T 2 : Z x : Wi), 
and this routine is then used as a subroutine for all subsequent operations (except 
for standalone doublings). 
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5 Arithmetic in Affine Coordinates with New Common 
Subexpressions 


The explicit formulas for arithmetic in genus 2 Jacobians are significantly more 
complicated than their elliptic curve counterparts, so it is especially useful to 
start the derivation by looking for common subexpressions and advantageous 
orderings in the affine versions of the formulas (i.e., before the introduction of 
more coordinates complicates the situation further). Our derivation follows that 
of [ 9 j, but it is important to point out that the resulting affine formulas have 
been refined by grouping new subexpressions throughout; these groupings were 
strategically chosen to exploit the symmetries of the q and r coordinates, and 
especially for the application of Jacobian coordinates that follows in Section [6j 
In what follows, we give the affine formulas for general point additions and 
general point doublings respectively. From Section [2j recall the abbreviated 
notation (g, r, 8,£) G A 4 (K) for the point in Jc with Mumford representation 
(, x 2 + qx + r, sx + £). 

Let Pi = (qi,ri,si,ti), P 2 = (q2,r 2 ,s 2 ,t 2 ) and Pi +P 2 =: P 3 = (43, r 3 , 8 3 , £ 3 ) 
be points in Jc satisfying Assumption [lj The choice of the three subexpressions 

A := (p - £ 2 ) (42 (41 - 42) - (ri - r 2 )) - r 2 (41 - q 2 ) (si - s 2 ) , 

B := (n - r 2 ) (q 2 (41 - q 2 ) - (r x - r 2 )) - r 2 (41 - q 2 ) 2 , 

C := (41 - q 2 ) (£1 - £ 2 ) - (ri - r 2 ) (si - s 2 ) 


is key to our refined derivation. The point P3 is then given by 


A B 2 

© = ( qi ~ qi) + 2— - ^ » 

A A 2 B 2 B 

r 3 = (qi - <72) + ^2 + (?i + 92) ^2 - (si + s 2 ) — , 

C C A 

«3 = (ri - r 3 ) — - qz (qi - q 3 ) — + (91 - 93) — - Si, 

A C 

h = (n ~ r 3 ) — - r 3 (91 - 93) — - ii. 


( 12 ) 


These formulas are used to derive the projective co-ZW addition formulas 
in EH those which form a basis for all of the other (non-doubling) formulas 
in this work. 

Let Pi = (41, ri, 81, £1) and [ 2 }P 1 =: P 3 = (43, r 3 , 83, £3) be points in J c 
satisfying Assumption [lj Again, it is particularly useful to make use of three 
subexpressions: 


= ((9? - 4 ri + a 3 ) q i - a 2 + sf) (91S1 - <i) + (39^ - 2 n + a 3 ) nsi, 

= 2 (91 si - ti)ii - 2r a sf, 

= ((9i — 4 ri + a 3 ) 91 — a2 + sf) + (39 j — 2ri + a 3 ) t \. 


A 

B 

C 
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The point P 3 is then given by 


_ A B 2 

93 ~ 2 c^c 2 ' 

A 2 B 2 B 

^ 3 q2 Q 1 

C C A 

S3 = (n - r 3 ) — - (73 (q x - q 3 ) — + ( qi - q 3 ) — - Si, 

A C 

t 3 = (ri - r 3 ) — - r 3 (gi - g 3 ) — - h. 


(13) 


These formulas are used to derive projective doubling formulas in £16.51 The 
formulas in (H2j) and (H3|) agree with those of Costello-Lauter [9] . 


6 Projective Arithmetic in Extended Jacobian 
Coordinates 

In this section we derive all of the explicit formulas that are needed for the scalar 
multiplication routines we describe in Section [7] The formulas are summarised in 
Table [2] below, where we immediately note the extension of auxiliary Jacobian 
coordinates discussed in Section [3] to include W 2 ; it is advantageous to carry 
this additional coordinate between consecutive operations because it is often 
computed en route to the output points already, and therefore comes for free as 
input into the following operation. We refer to this extended version of auxiliary 
Jacobian coordinates as extended Jacobian coordinates. Table [2] reports two sets 
of operation counts: the “plain” count, which corresponds to our deriving sets 
of formulas with the aim of minimising the total number of all field operations, 
and the “trade-offs” count. 

If W 2 is dropped from the coordinate system, and we work only with auxiliary 
Jacobian coordinates, (Q : R : S : T : Z : W), then we note that both DBL and 
DBLa2a3zero would require one extra squaring (in both the “plain” and “trade- 
off” formulas). The only other change resulting from this abbreviated coordinate 
system would be in the “trade-off” version of ADD, where a squaring would revert 
back to a multiplication. All other operation counts would remain unchanged. 

Following on from the discussion in Section [4j in 36.1l we start the derivations 
by using the affine addition formulas in (H2j) to develop projective formulas for 
zwADD; these are then used in the derivation of the formulas for ADD in 36.21 
for mADD in 36.31 and for mDBLADD in 36.41 Finally, we use the affine doubling 
formulas in (fl3l) to develop projective formulas for DBL in 36.51 

6.1 Projective co-ZW Addition (zwADD) 

Let P 1 = (Qx : iix : Sx : Ti : Zx : Wx), P 2 = (Q 2 : P 2 : P 2 : T 2 : : Wx), 

and Pi + P 2 =: P 3 = (Q 3 : R 3 : S 3 : T 3 : Z 3 : W 3 : W 2 ) represent three points 
in Jc satisfying Assumption [lj We emphasize that Pi and P 2 need not contain 
W 2 , which is why both are given in auxiliary Jacobian coordinates. However, 
the output P 3 is in extended Jacobian coordinates. The projective form of (fl2l) 
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Table 2. A summary of the explicit formulas derived in this section for various 
operations in the Jacobian, Jc, of an imaginary hyperelliptic curve C/if of genus 2, 
with char(P) > 5 


operation 

description 

derived 

field operations 

in J c 

of operation 

in 

“plain” 

w. “trade-offs” 

zwADD 

(Qi : Rx » Si t T x : Z % ; Wi) 

+ (Q 2 R 2 ■■ s 2 : T 2 : Z 1 : Wi) 

3m] 

25M + 3S 

+22a 

23M + 4S 

+40a 

ADD 

(Qi : -Ri : Si : Ti : Zi : Wi : W?) 
+(Q 2 : R 2 : S 2 : T 2 : Z 2 : W 2 : W 2 2 ) 

3m1 

41M + 7S 

-j~22 a 

35M + 12S 

+56a 

mADD 

(Q 1 :Ri:S 1 :T 1 :Z 1 :W 1 : W ?) 

+ (Q 2 : -R 2 : S 2 '■ T 2 * f t 1 : 1) 

3SH 

32M + 5S 

-\~22sl 

29M + 7S 

+44a 

mDBLADD 

[2](Qi : /.' : V : /', : X : II', : W?) 
+ (Q2 : -R 2 : -S 2 : T 2 : 1 : 1 : 1) 

30] 

57M + 8S 

+42a 

52M + 11S 

+82a 

DBL 

[2](Qi : /.*: : >: : 7, : Z; : U , : W?) 

30] 

26M + 8S + 2D 

+25a 

21M + 12S + 2D 

+52a 

DBL 

a2a3zero 

[2](Qi : /.' : V : /', : X : U', : W?) 

(when (12(13 = 0) 

30] 

25M + 6S 

-\~ 22 sl 

21M + 9S 

+48a 


in extended Jacobian coordinates corresponds to the following. We define the 
subexpressions 

A := (Ti - T 2 ) (Q 2 (Q 1 - Q 2 ) - (Pi - R 2 )) - R 2 (Q 1 - Q 2 ) (Si - S 2 ) , 

B := (R! - R 2 ) (Q 2 (Qi - Q 2 ) - (P-, - R 2 )) - R 2 (Qi - Q 2 ) 2 , 

C := (Qi - Q 2 ) (Ti - T 2 ) - (Pi - R 2 ) (Si - S 2 ) . 

The point P 3 is then given by 
W 3 =Wi[B], 

Qs = (Qi [C 2 ] - Q2 [C 2 ] ) + 2 AC- W 2 , 

P 3 = (Qi [C 2 ] - Q 2 [C 2 ] + AC) AC+ 

(Qi [c 2 ] + Q2 [c 2 ] ) W 2 - Si [C 3 B] - S 2 [C 3 B] , (14) 

S 3 = (Pi [C 4 ] - R 3 ) + (AC - Q 3 ) (Qi [C 2 ] - Q 3 ) - S 1 [C 3 B ] , 

T 3 = (Pi [C 4 ] - P 3 ) AC - P 3 (Qi [C 2 ] - Q 3 ) - T\ [C b B ] , 

^3 =Z 1 [C]. 

This operation, referred to as zwADD, not only computes P3, but also produces the 
subexpressions Q 1 [C 2 ] , Pi [C 4 ] , Pi [C 3 B ] , T x [C 5 B ] , Zi[C \ , W x [B ] , W% [B 2 ] ; if 
desired, these can be used to update Pi to be of the form 

Pi = (Qi • Ri:S l :T 1 :Zi:W 1 : W 2 ) 

= (Qi [C 2 ] : Pi [C 4 ] : Si [C 3 B] : Pi [C 5 B] : Z X \C\ : Wi[B] : W? [B 2 ]), 

so that it now has the same Z, W, and IV 2 coordinates as P 3 . The combination 
of the zwADD operation and this update will be denoted using the syntax 

(P 3 ,P0 :=Pi+P 2 , 

where P[ is the updated (but projectively equivalent) version of P\. 
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6.2 Projective Addition (ADD) 

Rather than producing lengthy formulas for additions, we use a simple con- 
struction that exploits zwADD. Let Pi = (Qi : Pi : Si : T\ : Z\ : W\ : W 2 ), 
p 2 = (Q 2 : P 2 : S 2 :T 2 : Z 2 : W 2 : W 2 ), and Pi + P 2 =: P 3 = (Q 3 : R 3 : S 3 : 
T 3 : Z 3 : W 3 : W 2 ) represent three points in Jc satisfying Assumption [TJ We can 
then cross-mult iply to define the points in auxiliary Jacobian coordinates 

P[ := (Qi [Zf] : Rx [Z 2 4 ] : S, [Z\W 2 \ : T x [Z^W 2 ] : Z X [Z 2 ] : Wi[W 2 ]) , 

P 2 := (Q 2 [^i] : R 2 [Zl] : S 2 [Z*Wx] : T 2 [Z\Wx\ : Z 2 [Zx] : W 2 [Wi]) . 

Observe that P[ = Pi and P 2 = P 2 , but that P[ and P 2 now share the same Z 
and W coordinates. This means that we can use the zwADD operation defined in 
96.11 to compute P 3 = Pi + P 2 as (P3, P") := P[ + P 2 . Observe that P[' = Pi, 
and that P" will share the same Z, W, and W 2 coordinates as P3. We note that 
this update of Pi into P” can be useful in the generation of lookup tables [26] , 
but is generally not useful during the main loop. 

6.3 Projective Mixed Addition (mADD) 

In a similar way, let Pi = (Qi \ R\ \ S\ \ T\ \ Z\ \ W\ : VL 2 ), P 2 = (Q2 • R 2 • 

S 2 : T 2 : 1 : 1 : 1), and Pi + P 2 =: P 3 = (Q 3 : R 3 : S 3 : T 3 : Z 3 : W 3 : W 2 ) 

represent three points in Jc satisfying Assumption [l] This time we only need to 
update P 2 into P 2 , which is performed in auxiliary Jacobian coordinates as 

P' 2 := [Q 2 [Zl] : R 2 [Z\] : S 2 [ZfWx] : T 2 \_Z\W\\ : [^] : [W^]) , 

where we observe that Pi and P 2 now have the same Z and W coordinates. 
Subsequently, using the zwADD operation from 96.11 allows P3 = Pi + P 2 to be 
computed by (P3, P[) := P\ + P 2 . 

6.4 Projective Mixed Doubling-and- Addition (mDBLADD) 

Let Pi = (Qi : Pi : Pi : Ti : Zi : Wi : W 2 ), P 2 = (Q 2 : P 2 : P 2 : T 2 : 1 : 1 : 1), 
and [2] Pi + P 2 =: P 3 = (Q 3 : R 3 : S 3 : T 3 : Z 3 : W 3 : W 2 ), represent three 
points in Jc satisfying Assumption [1] To compute [2] Pi + P 2 , we schedule the 
higher level operations in the form (Pi + P 2 ) + Pi (see [TT] and |26j for the same 
high level scheduling) . This means that mDBLADD can be computed using an mADD 
operation before a zwADD operation. (Subsequently, we must also assume that 
Pi, the intermediate point Pi + P 2 , and the output point [2] Pi + P 2 =: P 3 = 
(Q 3 : R 3 : S 3 : T 3 : Z 3 : W 3 : W 2 ) represent three points in Jc satisfying 
Assumption [l]) 


6.5 Projective Doubling (DBL) 

Let Pi = (Qi : Pi : Pi : Ti : Zi : Wi : W 2 ) and [2]Pi =: P 3 = (Q 3 : P 3 : 
S 3 : T 3 : Z 3 : W 3 : W 2 ) represent two points in Jc satisfying Assumption [l] 
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The projective form of m in extended Jacobian coordinates corresponds to the 
following. We define the subexpressions 

A := ((Qi (Qj - 4Ri) + (Q, - (a 2 /a 3 ) Z 2 ) a 3 Z 4 ) W 2 + 5?) (Q 1 S 1 - Ti) + 

(3 Q\ - 2 i?! + a 3 Z\) W-fRiSu 
B :=2(Q 1 S 1 -T 1 )T 1 -2R 1 S 2 1 , 

C := ((Qi (Q'l - 4 R,) + (Q 1 - (a 2 /a 3 ) Z 2 ) a 3 Zf) W? + 5?) Si+ 

(3 Q\ - 2 R x + a 3 Z-f) W'fTx. 

We can then write P 3 as 

W 3 =W 1 [B], 

Q 3 =2 AC -Wg, 

R 3 = ( AC) 2 + 2Qi [C 2 ] Wl - 2 Si [C 3 B] , 

S 3 = (R t [ C 4 ] - R 3 ) + (AC - Q 3 ) (Qi [C 2 ] - Q 3 ) - S 1 [ C 3 B ] , 

T 3 = (Ri [C 4 ] - R 3 ) AC - R 3 (Qx [C 2 ] - Q 3 ) - Ti [C 5 B] , 

Z 3 =Zi[C\. 

The DBL operation not only computes P3, but also produces the subexpressions 
Qi[C 2 ], Ri[C 4 ], S^B], Ti[C 5 B], Zi[C], W^B], WffB 2 ]; if desired, these 
can be used to update Pi into 

Pi = ( Qi :R 1 :S 1 :T 1 :Z 1 :W 1 : W 2 ) 

= (Qi [C 2 ] : R 1 [C 4 ] : Sr [C 3 B] : T x [C 5 B] : Z X \C\ : Wi[B] : W 2 [P 2 ]) , 

in order to share the same Z. W, and W 2 coordinates with P 3 . We define 
the operation DBLa2a3zero to be a doubling in the special case that the curve 
constants <22 and a3 are zero. 

7 Implementation 

We chose two different curves to showcase the explicit formulas derived in the 
previous section, both of which target the 128-bit security level. 

The first curve was found in the colossal point counting effort undertaken 
by Gaudry and Schost [18]. From a security standpoint, it is both twist-secure 
and it is not considered to be special (e.g. it has a large discriminant); from a 
performance standpoint, it was chosen over the arithmetically advantageous field 
with p = 2 127 — 1, and with optimal cofactors such that the curve supports 
a Gaudry-style Kummer surface implementation m- This is the same Kummer 
surface that was used to set speed records in 7j and [3] . We chose the Jacobian 
of this curve to illustrate the performance that is gained when using our new 
formulas inside a general “double-and-add” scalar multiplication routine. 

The second curve supports a 4-dimensional Gallant-Lambert-Vanstone (GLV) 
decomposition [15]. Over prime fields, requiring 4-dimensional GLV imposes that 
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the Jacobian has complex multiplication (CM) by a special field - in this case it is 
Q(C5). This (specialness) means that we cannot hope to find a twist-secure curve 
over a particular prime, but rather that we must search over many primes. In the 
same vein as 0 §8.3], we also wanted this curve to support a rational Gaudry- 
style Kummer surface. This curve is defined over the prime field p = 2 128 — c with 
c = 7689975, which is the smallest c > 0 such that a curve with CM by Q(Cs) 
over F p is twist-secure with optimal cofactor^. This curve was chosen to exhibit 
the performance that is gained when using our new formulas inside a GLV-style 
multiexponentation; in particular, each step of the multiexponentation requires 
only an mDBLADD operation, and this is where our explicit formulas offer the 
largest relative speedup over the previous ones. 

7.1 Working on the Gaudry-Schost Jacobian 

Let p = 2 127 — 1, and define the following constants in F p : a := 11, b := —22, 
c := -19, d := -3, e := 1 + ^-833/363 and / := 1 — ^-833/363. For the 
Rosenhain invariants A = = v — fy, the curve 

Cros/^p ■ y 2 = x(x - l)(x - X)(x - n)(x - v) 

is such that #Jc Ros =2 4 • r and #Jc' R =2 4 • r', where r and r' are 250- and 
251-bit primes respectively [18], and where C' Ros is the quadratic twist of Cr os • 
The coefficient of x A in Cr os is a = — (l + A + /i + ^), and we choose to zero 
it under the transformation ip : Cr os — >> C, (x,y) ^ (x — a/5,y). The resulting 
curve, C, has a coefficient of x 3 which is a fourth power in F p ; let it be u ~ A , 
where we chose u = 19859741192276546142105456991319328298. We can then 
use the map : C —> C, (x, y) (x-u 2 , y-v?) to work with the isomorphic curve 
C/F p : y 2 = x 5 6 + x 3 + a 2 X 2 + a\x + ao, where the coefficient of x 3 being 1 saves 
a multiplication inside every point doubling. We use the name Jacl271 for the 
Jacobian Jc, and use the name Kuml271 for the associated Kummer surface JC 
- this is defined by the above constants a, 6, c, d (see [lB]). 

In Section[8]we report two new sets of implementation numbers on Jacl271. 
First, we benchmark a generic scalar multiplication, using both the old and the 
new formulas, to illustrate the performance boost given by this work in the 
general case. In addition, we benchmark a fixed-base scalar multiplication, which 
uses the new formulas and takes advantage of precomputations on a public gen- 
erator to give large speedups on Jacl271. In the context of ECDHE, this second 
benchmark corresponds to the u key_gen” phase, which compliments the per- 
formance numbers for the “shared_secret” scalar multiplications on Kuml271 
in [7 and [3]. (We discuss some caveats related to this Jacobian/Kummer 

5 It is relatively straightforward to show that if Jc has CM by Q(Cs) and full rational 
two-torsion, then either Jc or J c / must contain a point of order 5; thus, the optimal 
cofactors are 16 and 80. 

6 If the coefficient of x 3 in C was not a fourth power, one could still use this form of 
transformation to achieve another “small” coefficient, or in this case, work on the 
twist instead. 
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combination in ra To tie these two sets of performance numbers together, 
we also benchmark the numbers for computing the map from Jacl271 onto 
Kuml271, which was made explicit in the AVIsogenies library and for general 
points in Jc Ros is given as 

& : Jc Ros 1C, [x 2 + qx + r, sx + t) t-» (X : Y : Z : T), 


where 

X = a (r(/i - r)( A + q + i/) - t 2 ) , T = b (r(v A - r)(l + q + /i) - t 2 ) , 

Z = c (r(z/ - r)(A + q + /x) - t 2 ) , T = d (r(/x A - r)(l + q + z/) - t 2 ) . (16) 

For practical scenarios like ECDHE, it is fortunate that we only need the map in 
this direction, as the pullback map from 1C to Jc Ros is much more complicated 
§4.3]. Since we compute in Jc (rather than Jc Ros ), we actually need to compute 
the composition of IF with (t/^) - 1 , which when extended to general points in Jc 
is 


('ipp) 1 '■ Jc — ^ Jc Ros , {x 2 + qx + r, sx + t) (x 2 + q'x + r', s'x + t f ), 

with q' = u~ 2 q J r 2a/h 1 r' = u~ 4 r + a/hq' — (<a/5) 2 , s' = u~ 3 s , t' = u~ 5 t + a/5s ' . 
Assuming that the input point in Jc is in extended Jacobian coordinates, the 
operation count for the full map \P' = from Jc to 1C is 1I + 31M + 2S + 

19a; we benchmark it alongside the scalar multiplications in Section [8J 

To draw a fair comparison against prior works, we inserted our formulas into 
the software made publicly available by Bos et al. |7], which itself employed 
the previous best formulas. (We tweaked both sets of formulas for Jacl271 to 
take advantage of the constant as = 1.) This software computes the scalar 
multiplications on Jacl271 using an adaptation of the left-to-right signed sliding 
window recoding from jl: with a window size of w = 5, where the lookup table 
consists of 8 points and is constructed using the same approach as in [26j §4]. 
The timings are presented in Section [8J 

7.2 Working on the Jacobian of a GLV curve 

Let p = 2 128 — 768 9 9 75 and define C/¥ p : y 2 = x 5 + 7 10 . The Jacobian groups 
Jc and Jc have cardinalities # Jc = 2 4 • 5 • r and # Jc =2 4 • r', where 

r = (2 252 + 375576928331233691782146792677798267213584131651764404159)/5, 
r = 2 252 - 375576928331887882475846226038533397089218679777223482485 

are both prime. 

The implementation of a 4-dimensional GLV scalar multiplication in Jc follows 
that which is described in [7, §6]; again, we wrapped their GLV software around 
both their old and our new formulas for a fair comparison - we note that both 
instances were made to use the above curve, which we refer to as GLV128c. 

Practically speaking, it does not make as much sense to benchmark GLV128c 
in the same ECDHE style as we discussed for Jacl271 and Kuml271. If there is 
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enough storage to exploit a long-term public generator P, then the presence of 
endomorphisms is essentially redundant in the key_gen phase, since multiples 
of P can then be precomputed offline without using an endomorphism. On the 
shared_secret side, where variable-base scalar multiplications are performed on 
fresh inputs, our implementations show that a 4-dimensional decomposition on 
GLV128c is still slightly slower than a Kummer surface scalar multiplication, so 
in the case of ECDHE, it is likely to be faster on both sides to stick with the 
combination of Jacl271 and Kuml271. Nevertheless, there could be scenarios 
where it makes sense to use the endomorphism on GLV128c (e.g. for a signature 
verification), and still make use of the maps between the full Jacobian group and 
the associated Kummer surface. In this case, the map in ([16]) and the pullback 
map in |J6j §4.3] can be exploited analogously to the case of Jacl271, keeping 
in mind that the maps would pass through the Jacobian of the Rosenhain form 
of C. 

Timings for a 4-dimensional GLV variable-base scalar multiplication on 
GLV128c using both the old and the new explicit formulas are given in Section[8l 

We note that in all scalar multiplication routines, i.e. in both fixed- and 
variable-base scalar multiplications on Jacl271 and in 4-dimensional multiexpo- 
nentiations on GLV128c, we always found it advantageous to convert the lookup 
table elements from extended Jacobian coordinates to affine coordinates using 
Montgomery’s simultaneous inversion method [30]. This “decision” is generally 
made easier in genus 2, where the difference between mixed additions and full 
additions is greater, and the relative cost of a field inversion (compared to the 
rest of the scalar multiplication routine) is much less than it is in the elliptic 
curve case. Finally, we note that the single conversion of the output point from 
Jacobian to affine coordinates comes at a cost of II + 10M + IS. 

7.3 A Disclaimer: The Difficulties Facing Constant-Time, 
Exception-Free Scalar Multiplications in Jc 

We must point out that none of the scalar multiplications on Jacl271 or GLV128c 
that we report in this paper run in constant time, and that the difficulties of 
achieving such a routine in genus 2 Jacobians is closely related to Assumption [lj 
We note that these are not the same implementation-level difficulties pointed out 
in |3j §1.2]; indeed, while the Kummer surface implementations reported in [3] 
and [7 run in constant time, a truly constant-time genus 2 implementation that 
does not use the Kummer surface is yet to be documented in the literature. 

More specifically, there are scalar recoding algorithms (cf. [20112] )) that make 
it possible to implement the Jacl271 or GLV128c routines such that scalar 
multiplications on random inputs will run in constant time with probability 
exponentially close to 1. However, in order to guard against active adversaries 
and to be considered truly constant-time, the routines should be guaranteed 
to execute identically and run correctly for all combinations of integer scalars 
and input points; this means the explicit formulas must be able to handle 
input combinations in Jc that are not “general” in the sense of Assumption [lj 
Although explicit formulas can be developed for each of these special cases, 
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their culmination into an efficient and truly constant-time scalar multiplication 
algorithm remains an important open problem. 

8 Results 

In this section we present the timings of the routines described in the previous 
section. All of the benchmarks were performed on an Intel Core i7-3770M (Ivy 
Bridge) processor at 3.4 GHz with hyperthreading turned off and over-clocking 
(“turbo-boost”) disabled, and all-but-one of the cores switched off in BIOS. The 
implementations were compiled with gee 4.6.3 with the -02 flag set and tested 
on a 64-bit Linux environment. Cycles were obtained using the SUPERCOP [5] 
toolkit and then rounded to the nearest 1,000 cycles. 

The primary purpose of our benchmarks is to compare the performance of 
scalar multiplications in genus 2 Jacobians using both the old and new sets of 
explicit formulas. Table [3] reports that a generic scalar multiplication on Jacl271 
using the explicit formulas in this paper gives a factor 1.25x improvement over 
one that uses the previous best formulas; this is the approximate speedup that 
one can expect when adopting extended Jacobian coordinates on any imaginary 
hyperelliptic curve of genus 2 over a large prime field. Table [4] reports that a 
4-dimensional GLV multiexponentiation routine using the explicit formulas in 
this paper gives a factor 1.29x improvement over the same routine that calls 
the previous explicit formulas. We note that the benchmarked implementations 
of the new formulas always used the “plain” versions (see Table [2j) , since these 
proved to be more efficient than the “trade-off” versions in our implementations. 


Table 3. Benchmarking the old and new explicit formulas in the context of a generic 
scalar multiplication on Jacl271 


curve 

coordinates 

formulas from 

cycles 

Jacl271 

homogeneous 

EE 

243,000 

Jacl271 

ext. Jacobian 

this work 

195,000 


Table 4. Benchmarking the old and new explicit formulas in the context of a 4-GLV 
scalar multiplication on GLV128c 


curve 

coordinates 

formulas from 

cycles 

GLV128c 

homogeneous 

EE 

166,000 

GLV128c 

ext. Jacobian 

this work 

129,000 


As a secondary set of benchmarks, in Table [5] we give summary performance 
numbers for the Gaudry-Schost curve in 97.11 in the context of ECDHE. Using 
extended Jacobian coordinates and precomputing a lookup table of size 256KB, 
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each key_gen operation takes around 40,000 cycles in total. (Note that this 
cycle count excludes the cycles required to transfer the lookup table from main 
memory to the cache.) Together with the recent Kummer surface performance 
numbers of Bernstein et al. [3], this gives an idea of the performance that is 
possible when space permits a significant precomputation in genus 2 ECDHE. 
Note, however, that until an efficient remedy to the issues discussed in 97.31 
is known, this style of key_gen in genus 2 is unprotected against side-channel 
attacks. We also benchmarked a fixed-base scalar multiplication with a much 
smaller 1KB lookup table, but it ran in 87,000, which when combined with the 
map, is not faster than the scalar multiplication on Kuml271 from [3j. 


Table 5. The performance of genus 2 in ECDHE on the Gaudry-Schost curve [T5] 


ECDHE operation 

details 

curve 

implementation 

cycles 

key_gen 

fixed-base scalar mul 

\P' map 

Jacl271 

this work 

this work (and [6]) 

36,000 

4,000 

shared_secret 

variable-base scalar mul 

Kuml271 

Bernstein et al. [3] 

91,000 


We reiterate that, to get the performance numbers in Tables [3] and [H and those 
for key_gen in Table O we modified the software made publicly available by Bos 
et al. [71 to be able to call both sets of explicit formulas. This software already 
included routines for general scalar multiplications, 4-GLV scalar multiplications, 
and the fixed-base scenario. To complete the benchmarks in Table [5j we ran the 
publicly available software from [3] on our hardware. 

9 Related Scenarios 

We conclude by mentioning some related cases of interest, for which the analogue 
of (extended) Jacobian coordinates and/or the co-Z idea could also be applied. 
The takeaway message of this section is that, while we focussed on the most 
common instance of genus 2 curves, the ideas in this work have the potential to 
boost the speed of arithmetic in other scenarios too. 

— Real Hyperelliptic Curves. In Section [2] we immediately specialised to 
the imaginary case, where C/K is hyperelliptic of degree 5 with one point 
at infinity. The other case in genus 2, where the curve is of degree 6 and 
has two points at infinity m i, has received less attention in papers pursuing 
high performance, since it is slightly slower than the imaginary case m- 
Moreover, it is often the case (at least among the scenarios of practical 
interest) that a degree 6 model contains a rational Weierstrass point and can 
therefore be transformed to a degree 5 model (e.g. the family in [f7j §4.4]). 
On the other hand, there are some scenarios where this transformation is not 
always possible, so it is of interest to see how efficient projective arithmetic 
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can be made in the real case, and whether analogues of the ideas in this work 
can be carried across successfully. 

— Pairings. Genus 2 pairings are also likely to benefit from Jacobian coor- 
dinates. Roughly speaking, the explicit formulas in this paper inherently 
compute the additional components (i.e. the Miller functions) that are 
required in a pairing computation. However, the resulting savings would 
not be as drastic, as the operations in Jc are dominated by extension field 
operations in a pairing computation. In addition, genus 2 has not been as 
competitive in the realm of pairings as it has as a standard discrete logarithm 
primitive, largely because the construction of competitive ordinary, pairing- 
friendly hyperelliptic curves has been very limited. On the other hand, there 
are attractive constructions of supersingular genus 2 curves m, which may 
be of interest in the “Type 1” setting, especially given that the fastest 
instantiations of such pairings are (in recent times) considered broken [2]. 
Interestingly, the construction in [f4( §7] is one example of a scenario where 
the real model cannot be converted into an imaginary one in general. 

— Low Characteristic / Higher Genus. The specialisation of Jacobian 
coordinates to low characteristic genus 2 curves and the extension to higher 
genus imaginary hyperelliptic curves follows analogously. However, the mo- 
tivation in both directions is nowadays stunted by their respective security 
concerns. Nevertheless, it could be worthwhile to see how much faster the 
arithmetic in these cases can become when using Jacobian coordinates. 

— The RM Families. We benchmarked the new explicit formulas in two 
scenarios; on a non-special “generic” curve, and on a curve with very special 
CM that subsequently comes equipped with an endomorphism. A third 
option comes from the families with explicit RM in which perhaps 
achieves the best of both worlds in genus 2: they also come equipped with 
an endomorphism, but are much more general than the CM curve we used. 
This generality dispels any security concerns associated with special curves, 
and moreover allows them to be found over a fixed prime field. Thus, at 
the 128-bit security level, one could find such a curve over p = 2 127 — 1 that 
facilitates both 2-dimensional GLV decomposition on its Jacobian and which 
supports a (twist-secure) Kummer surface. It would then be interesting to 
benchmark the new explicit formulas on one of these families, where the 
GLV routine would again make a higher relative frequency of calls to the 
fast mDBLADD routine. 
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Abstract. We present work in progress to completely factor seventeen 
Mersenne numbers using a variant of the special number field sieve where 
sieving on the algebraic side is shared among the numbers. It is expected 
that it reduces the overall factoring effort by more than 50%. As far 
as we know this is the first practical application of Coppersmith’s “fac- 
torization factory” idea. Most factorizations used a new double-product 
approach that led to additional savings in the matrix step. 

Keywords: Mersenne numbers, factorization factory, special number 
field sieve, block Wiedemann algorithm. 


1 Introduction 

Despite its allegedly waning cryptanalytic importance, integer factorization is 
still an interesting subject and it remains relevant to test the practical value of 
promising approaches that have not been tried before. An example of the latter 
is Coppersmith’s by now classical suggestion to amortize the cost of a precompu- 
tation over many factorizations [7 . The reason for the lack of practical validation 
of this method is obvious: achieving even a single “interesting” (i.e., record) fac- 
torization usually requires such an enormous effort [18] that an attempt to use 
Coppersmith’s idea to obtain multiple interesting factorizations simultaneously 
would be prohibitively expensive, and meeting its storage requirements would 
be challenging. 

But these arguments apply only to general numbers, such as RSA moduli [30], 
the context of Coppersmith’s method. Given long-term projects such as |9TT0|5] 
where many factoring-enthusiasts worldwide constantly busy themselves to fac- 
tor many special numbers, such as for instance small-radix repunits, it makes 
sense to investigate whether factoring efforts that are eagerly pursued no matter 
what can be combined to save on the overall amount of work. This is what we 
set out to do here: we applied Coppersmith’s factorization factory approach in 
order to simultaneously factor seventeen radix-2 repunits, so-called Mersenne 
numbers. Except for their appeal to makers of mathematical tables, such factor- 
izations may be useful as well Ha- 

Let S = {1007, 1009, 1081, 1093, 1109, 1111, 1117, 1123, 1129, 1147, 1151, 1153, 
1159,1171,1177,1193,1199}. For all n £ S we have determined, or are in the 

* Part of this work was conducted while this author was at Microsoft Research, One 

Microsoft Way, Redmond, WA 98052, USA. 

P. Sarkar and T. Iwata (Eds.): ASIACRYPT 2014, PART I, LNCS 8873, pp. 358 4377] 2014. 
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process of determining (for updates, see [19]), the full factorization of 2 n — 1, 
using the method proposed in [7J Section 4] adapted to the number field sieve 
(SNFS, [22]). Furthermore, for two of the numbers a (new, but rather obvious) 
multi-SNFS approach was exploited as well. Most of our new factorizations will 
soundly beat the previous two SNFS records, the full factorizations of 2 1039 — 1 
and 2 1061 — 1 reported in [[Q and Jr respectively. Measuring individual (S)NFS- 
efforts, factoring 2 1193 — 1 would require about 20 times the effort of factoring 
21039 _ i or m ore than twice the effort of factoring the 768-bit RSA modulus 
from [18 s . Summing the individual efforts for the seventeen numbers involved 
would amount to more than one hundred times the (2 1039 — l)-effort. Extrap- 
olating our results so far, we expect that sharing the work a la Coppersmith 
will allow us to do it in about 50 times that effort. The practical implications of 
Coppersmith’s method for general composites remain to be seen. 

Although the factoring efforts reported here shared parts of the sieving tasks, 
each factorization still required its own separate matrix step. With seventeen 
numbers to be factored, and thus seventeen matrices to be dealt with, this gave 
us, and is still giving us, ample opportunity to experiment with a number of 
new algorithmic tricks in our block Wiedemann implementation, following up on 
the work reported in Q] and [18]. While the savings we obtained are relatively 
modest, given the overall matrix effort involved, they are substantial in absolute 
terms. Several of the matrices that we have dealt with, or will be dealing with, 
are considerably larger than the one from [18] , the largest published comparable 
matrix problem before this work. 

Section [2] gives background on the (S)NFS and Coppersmith’s method as 
required for the paper. Section [3] introduces our two sets of target numbers to 
be factored, while sections [4] and [5] describe our work so far when applying the 
two main steps of the SNFS to these numbers. Section [6] provides evidence of 
the work completed so far and Section [3 presents a few concluding remarks. 

All core years reported below are normalized to 2.2 GHz cores. 


2 Background on (S)NFS and Coppersmith’s Method 

2.1 Number Field Sieve 

To factor a composite integer N in the current range of interest using the num- 
ber field sieve (NFS, [22]), a linear polynomial g G Z[X\ and a degree d > 1 
polynomial / G Z[X\ are determined such that g and / have, modulo AT, a 
root m « 7V 1 /( d+1 ) in common. For any m one may select g(X) = X — m and 

f( x ) = T,t=ofi Xi w h ere N = Et=o /* TO * and 0 < fi < m (or \fi\ < f) for 
0 < i < d. Traditionally, everything related to the linear polynomial g is re- 
ferred to as “rational” and everything related to the non-linear polynomial / as 
“algebraic” . 

Relations are pairs of coprime integers a, b with b > 0 such that bg(a/b) and 
b d f(a/b ) have only small factors, i.e., are smooth. Each relation corresponds 
to the vector consisting of the exponents of the small factors (omitting details 
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that are not relevant for the present description). Therefore, as soon as more 
relations have been collected than there are small factors, the vectors are linearly 
dependent and a matrix step can be used to determine an even sum of the vectors: 
each of those has probability at least 50% to lead to a non-trivial factor of N. 

Balancing the smoothness probability and the number of relations required 
(which both grow with the number of small factors) the overall heuristic expected 
NFS factoring time is L((6 4 / 9) 1//3 ) ~ L(1.923) asymptotically for N oo, where 

L(c ) = c] and L[p, c } = exp((c + o(l))(log(AT)) p (log(log(Ar))) 1_p ) 

for 0 < p < 1 and the degree d is chosen as an integer close to (i^H^^yy) 1 ^ 3 - 
A more careful selection of g and / than that suggested above (following for 
instance CZI) can lead to a substantial overall speed-up but has no effect on the 
asymptotic runtime expression. 

For regular composites the fi grow as TV 1 /^ 1 ) which is only N°^ for N — >■ oo 
but in general not 0(1). Composites for which the fi are 0(1) are “special” and 
the SNFS applies: its heuristic expected runtime is L((32/9) 1 / 3 ) ~ L(1.526) 
asymptotically for N — >• oo, where the degree d is chosen as an integer close to 
( 2 \og{iog(N)) ) 1 ^ 3, Both asymptotically and in practice the SNFS is much faster 
than the NFS, with a slowly widening gap: for 1000-bit numbers the SNFS is 
more than ten thousand times faster, for 1200-bit numbers it is more than 30 
thousand times faster. 

The function L(c) satisfies various useful but unusual properties, due to the 
o( 1) and N oc: L(ci)L(c 2 ) = L(ci + C 2 ), L(ci) + Lfa) = T(max(ci, C 2 )), and 
for c > 0 and fixed k it is the case that (log(N)) k L(c) = L(c)/log(L(c)) = L(c). 

2.2 Relation Collection 

We briefly discuss some aspects of the relation collection step that are relevant 
for the remainder of the paper and that apply to both the NFS and the SNFS. 
Let N be the composite to be factored, c = (64/9) - 1 / 3 (but c = (32/9) 1 / 3 if 
N is special), and assume the proper corresponding d as above. Heuristically 
it is asymptotically optimal to choose L(|) as the upper bound for the small 
factors in the polynomial values and to search for relations among the integer 
pairs (a, b) with \a\ < L( f) and 0 < b < L( f). For the NFS the rational and 
algebraic polynomial values then have heuristic probabilities L(^) and L(^ £ ) 
to be smooth, respectively; for the SNFS both probabilities are L(^). Either 
way (i.e., NFS or SNFS) and assuming independence of the polynomial values, 
the polynomial values are both smooth with probability L(^). Over the entire 
search space L(c)L(=f) = L(|) relations may thus be expected, which suffices. 

Relation collection can be done using sieving because the search space is a 
rectangle in Z 2 and because polynomial values are considered. The latter implies 
that if p divides g(s) (or /(s)), then p divides g(s + kp) (or f(s + kp)) for any 
integer &, the former implies that given s all corresponding values s + kp in the 
search space are quickly located. Thus, for one of the polynomials, sieving is used 
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to locate all pairs in the search space for which the corresponding polynomial 
value has only factors bounded by L ( |). This costs 




p prime, p<I/(§) 


(for TV — >> oo, due to the o(l) in L(c )) and leads to pairs for which the polynomial 
value is smooth. Next, in the same way and at the same cost, the pairs are located 
for which the other polynomial value is smooth. Intersecting the two sets leads 
to L( |) pairs for which both polynomial values are smooth. 

Sieving twice, once for each polynomial, works asymptotically because L(c) + 
L{c) = L{c). It may be less obvious that it is also a good approach in practice. 
After all, after the first sieve only pairs remain that are smooth with respect to 
the first polynomial, so processing those individually for the second polynomial 
could be more efficient than reconsidering the entire rectangular search space 
with another sieve. It will depend on the circumstances what method should be 
used. For the regular (S)NFS using two sieves is most effective, both asymp- 
totically and in practice: sieving is done twice in a “quick and dirty” manner, 
relying on the intersection of the two sets to quickly reduce the number of re- 
maining pairs, which are then inspected more closely to extract the relations. In 
Section [2T4l however, different considerations come into account and one cannot 
afford a second sieve - asymptotically or in practice - precisely because a second 
sieve would look at too many values. 

As suggested in [28] the sieving task is split up into a large number of some- 
what overlapping but sufficiently disjoint subtasks. Given a root z modulo a 
large prime q of one of the polynomials, a subtask consists of sieving only those 
pairs (a, b ) for which a/b = z mod q and for which therefore the values of that 
polynomial are divisible by q. This implies that the original size L(c) rectangular 
search space is intersected with an index-g sublattice of Z 2 , resulting in a size 
L{c)/q search space. Sieving can still be used in the new smaller search space, 
but in a somewhat more complicated manner [28] , as first done in |T5] and later 
much better in m- Also, more liberal smoothness criteria allow several primes 
larger than L(~) in either polynomial value [II]. This complicates the decision 
of when enough relations have been collected and may increase the matrix size, 
but leads to a substantial overall speed-up. Another complication that arises is 
that duplicate relations will be found, i.e., by different subtasks, so the collection 
of relations must be made duplicate-free before further processing. 

2.3 Matrix and Filtering 

Assume that the numbers of distinct rational and algebraic small primes allowed 
in the smooth values during relation collection equal rq and 7*2, respectively. With 
r = 7*1 + 7 * 2 , each relation corresponds to an r-dimensional vector of exponents. 
With many distinct potential factors (i.e., large rq and 7*2) of which only a few 
occur per smooth value, the exponent vectors are huge-dimensional (with r on 
the order of billions) and very sparse (on average about 20 non-zero entries). 
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As soon as r+1 relations have been collected, an even sum of the corresponding 
r-dimensional vectors (as required to derive a factorization) can in principle be 
found using linear algebra: with v one of the vectors and the others constituting 
the columns of an r x r matrix M raw , an r-dimensional bit- vector x for which 
M raw x equals v modulo 2 provides the solution. Although a solution has at least 
a 50% chance to produce a non-trivial factorization, it may fail to do so, so in 
practice somewhat more relations are used and more than a single independent 
solution is derived. 

The effort required to find solutions (cf. Section [5]) grows with the product of 
the dimension r and the number of non-zero entries of M raw (the weight of M raw ). 
A preprocessing filtering step is applied first to M raw in order to reduce this prod- 
uct as much as is practically possible. It consists of a “best effort” to transform, 
using a sequence of transformation matrices, the initial huge-dimensional matrix 
M raw of very low average column weight into a matrix M of much lower dimen- 
sion but still sufficiently low weight. It is not uncommon to continue relation 
collection until a matrix M can be created in this way that is considered to be 
“doable” (usage of a second algebraic polynomial for some of our factorizations 
takes this idea a bit further than usual; cf. sections EU and SJ). Solutions for 
the original matrix M raw easily follow from solutions for the resulting filtered 
matrix M. 


2.4 Coppersmith’s Factorization Factory 

Coppersmith, in [7, Section 4], observed that a single linear polynomial g may 
be used for many different composites as long as their (d + l)st roots are not 
too far apart, with each composite still using its own algebraic polynomial. Thus 
smooth b g (a /b) -values can be precomputed in a sieving step and used for each 
of the different factorizations, while amortizing the precomputation cost. We 
sketch how this works, referring to [7, Section 4] for the details. 

After sieving over a rectangular region of L(2.007) rational polynomial val- 
ues with smoothness bound L(0.819) a total of L(1.639) pairs can be expected 
(and must be stored for future use) for which the rational polynomial value is 
smooth. Using this stored table of L (1.639) pairs corresponding to smooth ratio- 
nal polynomial values, any composite in the correct range can be factored at cost 
L (1.639) per composite: the main costs per number are the algebraic smooth- 
ness detection, again with smoothness bound L(0.819), and the matrix step. 
Factoring t = L(e) such integers costs L(max(2.007, 1.639 + e)), which is advan- 
tageous compared to Mold application of the regular NFS (at cost L( 1.923) per 
application) for i > L(0.084). Thus, after a precomputation effort of L(2.007), 
individual numbers can be factored at cost L( 1.639), compared to the individual 
factorization cost L (1.923) using the regular NFS. 

During the precomputation the L(1.639) pairs for which the rational polyno- 
mial value is smooth are found by sieving L(2.007) locations. This implies that, 
from an asymptotic runtime point of view, a sieve should not be used to test the 
resulting L( 1.639) pairs for algebraic smoothness (with respect to an applica- 
ble algebraic polynomial), because sieving would cost L( 2.007). As a result each 
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individual factorization would cost more than the regular application of the NFS. 
Asymptotically, this issue is resolved by using the elliptic curve factoring method 
(ECM, [24]) for the algebraic smoothness test because, for smoothness bound 
L(0.819), it processes each pair at cost L(0), resulting in an overall algebraic 
smoothness detection cost of L (1.639). In practice, if it ever comes that far, the 
ECM may indeed be the best choice, factorization trees m and m Section 4]) 
may be used, or sieving may simply be the fastest option. Because the smooth 
rational polynomial values will be used by all factorizations, in practice the 
rational precomputation should probably include, after the sieving, the actual 
determination of all pairs for which the rational polynomial value is smooth: in 
the regular (S)NFS this closer inspection of the sieving results takes place only 
after completing both sieves. 

These are asymptotic results, but the basic idea can be applied on a much 
smaller scale too. With a small number i of sufficiently close composites to be 
factored and using the original NFS parameter choices (and thus a table of 
L (1.683) as opposed to L (1.639) pairs), the gain approaches 50% with growing i 
(assuming the matrix cost is relatively minor and disregarding table-storage 
issues). It remains to be seen, however, if for such small £ individual processing 
is not better if each composite uses a carefully selected pair of polynomials as 
in m, and if that effect can be countered by increasing the rational search space 
a bit while decreasing the smoothness bounds (as in the analysis from H). 

We are not aware of practical experimentation with Coppersmith’s method. 
To make it realistically doable (in an academic environment) a few suitable 
moduli could be concocted. The results would, however, hardly be convincing 
and deriving them would be mostly a waste of computer time - and electric 
power [20] . We opted for a different approach to gain practical experience with 
the factorization factory idea, as described below. 


2.5 SNFS Factorization Factory 

If we switch the roles of the rational and algebraic sides in Coppersmith’s factor- 
ization factory, we get a method that can be used to factor numbers that share 
the same algebraic polynomial, while having different rational polynomials. Such 
numbers are readily available in the Cunningham project mmB They have 
the additional advantage that obtaining their factorizations is deemed to be de- 
sirable, so an actual practical experiment may be considered a worthwhile effort. 
Our choice of target numbers is described in Section [3] First we present the the- 
oretical analysis of the factorization factory with a fixed algebraic polynomial 
with 0(1) coefficients, i.e., the SNFS factorization factory. 

Let L(2a) be the size of the sieving region for the fixed shared algebraic 
polynomial (with coefficient size 0(1)), let L(/3) and L( 7 ) be the algebraic and 


1 On an historical note, the desire to factor the ninth Fermat number 2 1 2 + 1, in 1988 

the “most wanted” unfactored Cunningham number, inspired the invention of the 
SNFS, triggering the development of the NFS; the details are described in [22] . 
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rational smoothness bounds, respectively. Assume the degree of the algebraic 
polynomial can be chosen as ^( i Q gpo^v)) ) 1 ^ 3 f° r all numbers to be factored. 

The algebraic polynomial values are of size L [ |, aS] and are thus assumed to 
be smooth with probability L(—^) (cf. [2TJ Section 3.16]). With the coefficients 
of the rational polynomials bounded by L [ |], the rational polynomial values 
are of size L [ |, |] and may be assumed to be smooth with probability L{— 3^). 
To be able to find sufficiently many relations it must therefore be the case that 

- max(A7) ' (1) 

The precomputation (algebraic sieving) costs L(2a) and produces L(2a — ^|) 
pairs for which the algebraic value is smooth. Per number to be factored, a total 
of L( max(/3,y) + -^) of these pairs are tested for smoothness (with respect to 
L(y)), resulting in an overall factoring cost 

L(max(2/3, 2y, max(/i, 7) + -^)) 

per number. If p 7^ 7, then replacing the smaller of p and 7 by the larger increases 
the left hand side of condition leaves the right hand side unchanged, and does 
not increase the overall cost. Thus, for optimal parameters, it may be assumed 
that p = 7. This simplifies the cost to L(max(27,7 + 37^)) and condition (jTJ) to 

(2 — ^)<a > 7 + 37^, which holds for some a > 0 as long as 5 < 67. Fixing S, the 
cost is minimized when 27 = 7 + 37^ or when 7 + 37^ attains its minimum; these 
two conditions are equivalent and the minimum is attained for 7 = (3£) -1 / 2 . 
The condition S < 67 translates into S < 12 1 / 3 respectively 7 > 18“ 1 / 3 . It 
follows that for 5 approaching 12 1 / 3 from below, the factoring cost per number 
approaches L((4/ 9) 1 / 3 ) « L(0.763) from above, with a precomputation cost of 
L(2a), a 00. These SNFS factorization factory costs should be compared to 
individual factorization cost L((32/9) 1 / 3 ) ~ L( 1.526) using the regular SNFS, 
and approximate individual factoring cost L (1.639) after a precomputation at 
approximate cost L(2.007) using Coppersmith’s NFS factorization factory. 

Assuming 7 = (3£) -1 / 2 , the choices 7 = (2/9) 1 / 3 and a = (128/343) 1 / 3 
lead to minimal precomputation cost L((4/3) 5 / 3 ) « L(1.615), and individual 
factoring cost L((4/3) 2 / 3 ) ~ L(1.211). This makes the approach advantageous if 
more than approximately L(0.089) numbers must be factored (compare this to 
L(0.084) for Coppersmith’s factorization factory). However, with more numbers 
to be factored, another choice for 7 (and thus larger a) may be advantageous 
(cf. the more complete analysis in H9lb 

3 Targets for the SNFS Factorization Factory 

3.1 Target Set 

For our SNFS factorization factory experiment we chose to factor the Mersenne 
numbers 2 n — 1 with 1000 < n < 1200 that had not yet been fully factored, 
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the seventeen numbers 2 n — 1 with n E S as in the Introduction. We write 
S = Si USn, where Si is our first batch containing exponents that are ±1 mod 8 
and Si i is the second batch with exponents that are ±3 mod 8. Thus 

Si = {1007, 1009, 1081, 1111, 1129, 1151, 1153, 1159, 1177, 1193, 1199}, 

and 

S u = {1093, 1109, 1117, 1123, 1147, 1171}. 

Once these numbers have been factored, only one unfactored Mersenne number 
with n < 1200 remains, namely 2 991 — 1. It can simply be dealt with using an 
individual SNFS effort, like the others with n < 1000 that were still present 
when we started our project. Our approach would have been suboptimal for 
these relatively small n. 

Around 2009, when we were gearing up for our project, there were several more 
exponents in the range [1000,1200]. Before actually starting, we first used the 
ECM in an attempt to remove Mersenne numbers with relatively small factors 
and managed to fully factor five of them [4] : one with exponent 1 mod 8 and 
four with exponents ±3 mod 8. Three, all with exponents ±3 mod 8, were later 
factored by Ryan Propper (using the ECM, [34]) and were thus removed from 
Su. Some other exponents which were easier for the SNFS were taken care of by 
various contributors as well, after which the above seventeen remained. 


3.2 Polynomial Selection for the Target Set 

We used two different algebraic polynomials: fi = X 8 — 2 for n = ±1 mod 8 in Si 
and fu = X 8 — 8 for n = ±3 mod 8 in Su. This leads to the common roots m n 
and rational polynomials g n corresponding to n as listed in Table [lj Relations 
were collected using two sieves (one for fi shared by eleven n- values, and one for 
fu shared by six n- values) and seventeen factorization trees (one for each g n ), as 
further explained in Section [4j Furthermore, in an attempt to reduce the effort 
to process the resulting matrix, for n E {1177, 1199} additional relations were 
collected using the algebraic polynomial //, as specified in Table [l] along with 
the common roots m' n and rational polynomials g' n . Although n = 1177 and 
n = 1199 share //, to obtain the additional relations it turned out to be more 
convenient to use the vanilla all-sieving approach from m twice, cf. Section l4~4l 
Another possibility would have been to select the single degree 6 polynomial 
X 6 — 2. Its relatively low degree and very small coefficients lead to a huge number 
of smooth algebraic values, all with a relatively large rational counterpart (again 
due to the low degree). Atypically, rational sieving could have been appropriate, 
whereas due to large cofactor sizes rational cofactoring would be relatively costly. 
Overall degree 8 can be expected to work faster, despite the fact that it requires 
two algebraic polynomials. Degree 7 would require three algebraic polynomials 
and may be even worse than degree 6 for our sets of numbers, but would have 
had the advantage that numbers of the form 2 n + 1 could have been included too. 
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Table 1. The shared algebraic polynomials, roots, and rational polynomials for the 
11 + 6 = 17 Mersenne numbers 2 n — 1 considered here 




fi = X s - 2 



fn 

X s -8 


n 


n mod 8 m n 

9 n 

n 

n mod 8 

m n 

Qn 

o 

o 

-<r 


2 126 

X - 2 126 

1093 ] 


2 137 

X - 2 137 

1111 


2 139 

X - 2 139 

1109 > 

-3 

2 139 

X - 2 139 

1151 

> 

-1 2 144 

X - 2 144 

1117 J 


2 140 

X - 2 140 

1159 


2 145 

X - 2 145 





1199 , 


2 150 

X - 2 150 





1009 < 


2 - 126 

2 126 X - 1 

1123] 


2 - 140 

2 140 X - 1 

1081 


2 - 135 

2 135 X - 1 

1147 > 

3 

2 - 143 

2 143 X - 1 

1129 


i 2 “ 141 

2 141 X - 1 

1171 J 


2“ 146 

2 146 X - 1 

1153 

> 

1 2 -144 

2 144 X - 1 





1177 


2 - 147 

2 147 X - 1 





1193 


2- 149 

2 149 X - 1 






fi = X 5 + X 4 — 4X 3 - 3X 2 + 3X + 1 
n mn On 

1177 2 107 + 2 -107 2 107 A - (2 214 + 1) 
1199 2 109 + 2“ 109 2 109 X - (2 218 + 1) 


4 Relation Collection for the Target Set 

4.1 Integrating the Precomputation 

The first step of Coppersmith’s factorization factory is the preparation and stor- 
age of a precomputed table of pairs corresponding to smooth rational polynomial 
values. With the parameters from [7] this table contains L( 1.639) pairs. Assum- 
ing composites of relevant sizes, this is too large to be practical. If we apply 
Coppersmith’s idea as suggested in the second to last paragraph of Section 12.41 
to a relatively small number of composites with the original NFS parameter 
choices, the table would contain L (1.683) pairs, which is even worse. 

Here we can avoid excessive storage requirements. First of all, with the origi- 
nal SNFS parameter choices the table would contain “only” L (1.145) pairs cor- 
responding to smooth algebraic polynomial values, because we are using the 
factorization factory for the SNFS with a shared algebraic polynomial. Though 
better, this is still impractically large. Another effect in our favor is that we are 
using degree 8 polynomials, which is a relatively large degree compared to what 
is suggested by the asymptotic runtime analysis: for our N - values the integer 
closest to ( 2 iog 0 og( 7 V)) ) 1 ^ 3 wou ^ be 6. A larger degree leads to larger algebraic 
values, fewer smooth values, and thus fewer values to be stored. 

Most importantly, however, we know our set of target numbers in advance. 
This allows us to process precomputed pairs right after they have been gener- 
ated, and to keep only those that lead to a smooth rational polynomial value 
as well. With £ numbers to be factored and L(— t p^) as smoothness bound 
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(cf. Sect ion [232]) . this reduces the storage requirements from L(1.523)L( -1 4 523 ) = 
L( 1.145) to ^L(1.523)I/( ~ 1 2 523 ) = TL(0.763). For our target sets this is only on 
the order of TBs (less than six TBs for Sn.). 

The precomputation and the further processing are described separately. 

4.2 Algebraic Sieving 

For the sieving of the polynomial fi = X 8 — 2 from Section lT2l we used a search 
space of approximately 2 66 pairs and varying smoothness bounds. At most two 
larger primes less than 2 37 were allowed in the otherwise smooth /i- values. 

The sieving task is split up into a large number of subtasks: given a root 2 
of fi modulo a large prime number g, a subtask consists of finding pairs (a, b ) 
for which a/b = z mod q (implying that q divides b 8 f\(a/b)) and such that the 
quotient b 8 fi(a/b)/q is smooth (except for the large primes) with respect to the 
largest h • 10 8 less than g, with h G {3, 4, 6, 8, 12, 15, 20, 25, 30, 35}. 

Pairs (a, b) for which a/b = z mod q form a two-dimensional lattice of index q 
in Z 2 with basis (■*), (}). After finding a reduced basis rqu G Z 2 for the lattice, 
the intersection of the original search space and the lattice is approximated as 
{(£) = iu + jv : i, j G Z, \i\ < 2 7 ,0 < j < 2 J }. The bounds /, J G Z>o were 
(or, rather, “are ideally” , as this is what we converged to in the course of our 
experiment) chosen such that I + J + log 2 (g) ~ 65 and such that max(|a|) « 
max(b), thus taking the relative lengths of u and v into account. Sieving takes 
place in a size 2 /+J+1 rectangular region of the (i, j) -plane while avoiding storage 
for the (even, even) locations, as described in m- After the sieving, all /i- values 
corresponding to the reported locations are divided by q and trial-divided as 
also described in [12], allowing at most two prime factors between q and 2 37 . 
Allowing three large primes turned out to be counterproductive with slightly 
more relations at considerably increased sieving time or many more relations at 
the expense of a skyrocketing cofactoring effort. 

Each (a, b) with smooth algebraic polynomial value resulting from subtask 
(g, z) induces a pair (—a, b) with smooth algebraic polynomial value for subtask 
(q,—z). Subtasks thus come in pairs: it suffices to sieve for one subtask and 
to recover all smooth pairs for the other subtask. For n > 1151 we used most 
q- values with 4-10 8 <g<8-10 9 (almost 2 33 ), resulting in about 157 million 
pairs of subtasks. For the other n - values we used fewer pairs of subtasks: about 
126 million for n G {1007, 1009} and about 143 million for the others. 

Subtasks are processed in disjoint batches consisting of all (prime, root) pairs 
for a prime in an interval of length 2500 or 10 000. Larger intervals are used for 
larger q- values, because the latter are processed faster: their sieving region is 
smaller (cf. above), and their larger smoothness bounds require more memory 
and thus more cores. After completion of a batch, the resulting pairs are in- 
spected for smoothness of their applicable rational polynomial values as further 
described below. Processing the batches, not counting the rational smoothness 
tests, required about 2367 core years. It resulted in 1.57 • 10 13 smooth algebraic 
values, and thus for each n G S\ at most twice that many values to be inspected 
for rational smoothness. Storage of the 1.57 • 10 13 values (in binary format at 
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five bytes per value) would have required 70 TB. As explained in Section 14.11 
we avoided these considerable storage requirements by processing the smooth 
algebraic values almost on-the-fly; this also allowed the use of a more relaxed 
text format at about 20 bytes per value. 

Sieving for n G Su was done in the same way. For the polynomial fu = X 8 — 8 
and n G {1147, 1171} about 118 million pairs of subtasks were processed for most 
q- values with 3 • 10 8 < q < 5.45 • 10 9 . For the other n- values in Su about 94% 
to 96% of that range of g-values sufficed. Overall, sieving for n G Su required 
1626 core years and resulted in 1.16 • 10 13 smooth algebraic values. 


4.3 Rational Factorization Trees 

Each time a batch of /i-sieving subtasks is completed (cf. Section 14.2ft the 
pairs (a, b ) produced by it are partitioned over four initially empty queues 
Q34, Q35, Q36, and Q37: if the largest prime in the factorization of b 8 fi(a/b ) 
has bitlength i for i G {35, 36, 37} then the pair is appended to all remaining 
pairs are appended to Q34. 

After partitioning the new pairs among the queues, the following is done 
for each n G S\ (cf. Section [3TT) ) . For all pairs (a, b) in U ^34 Qi, with a(n) 
as in Table [2j the rational polynomial value bg n (a/b) (with g n as in Table [T} 
is tested for smoothness: if bg n (a/b ) is smooth, then (a, b) is included in the 
collection of relations for the factorization of 2 n — 1, else (a, b) is discarded. The 
smoothness test for the 6g n (a/6)-values is conducted simultaneously for all pairs 
(a, b) G U^34 Qi using a factorization tree as in [TH Section 4] (see also [3]) with 
r(n) • 10 8 and 2 /3 ( n ) as smoothness and cofactor bounds, respectively (with r(n) 
and / 3 (n) as in Table [2]). Here the cofactor bound limits the number and the size 
of the factors in bg n {a/b ) that are larger than the smoothness bound. 

For all n G Si, besides the runtimes Table [2] also lists the numbers of relations 
found, of free relations |23j, of relations after duplicate removal (and inclusion of 
the free relations), and of prime ideals that occur in the relations before the first 
singleton removal (where the number of prime ideals is the actual dimension 
of the exponent vectors). All resulting raw matrices are over-square. For n G 
{1193, 1199} the over-squareness is relatively small. For n = 1193 we just dealt 
with the resulting rather large filtered matrix. For n = 1199, and for n = 1177 as 
well, additional sieving was done, as further discussed in the section below. The 
unusually high degree of over-squareness for the smaller n- values is a consequence 
of the large amount of data that had to be generated for the larger n- values, and 
that could be included for the smaller ones at little extra cost. 

Completed batches of subtasks for fu - sieving were processed in the same way. 
The results are listed in Table O 


4.4 Additional Sieving 

In an attempt to reduce the size of the (filtered) matrix we collected additional 
relations for n G {1177, 1199} using the degree 5 algebraic polynomial // and 
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Table 2. 


n 

a(ri) 

r(n) 

p(p) 

core 

years 


relations 


occurring 
prime ideals 

found 

free 

total unique 

1007 

34 

5 

99 

26 

6157 265 485 

47681523 

4 083 240 054 

1 488 688 670 

1009 

34 

5 

99 

26 

6 076 365 897 

47681523 

4 030 378 014 

1487997805 

1081 

35 

5 

103 

48 

7704 145 069 

92 508 436 

5 484 250 026 

2 828 752 381 

1111 

35 

5 

103 

46 

5 636 554 807 

92 508 436 

4 045 778 202 

2 744 898 588 

1129 

35 

5 

103 

47 

4 860167 788 

92 508 436 

3 447412 400 

2 690 405 347 

1151 

36 

5 

105 

77 

9 026 908 346 

179 644 953 

6 878 035126 

5 229 081896 

1153 

36 

5 

105 

78 

8 919 329 699 

179 644 953 

6 798 580 785 

5 219 976 433 

1159 

36 

5 

105 

78 

8 494 336 817 

179 644 953 

6 454 287 572 

5179 538 761 

1177 

37 

20 

138 

140 

15 844 796 536 

349 149 710 

12 687801912 

10 098132 272 

1193 

37 

20 

141 

171 

13 873 940124 

349 149 710 

11 120 476 664 

9 912 486 202 

1199 

37 

20 

141 

169 

13 201986116 

349 149 710 

10 600157337 

9 795 656 570 

core years for n G Si: 

906 





1093 

35 

5 

103 

37 

5 380 284 567 

92 508 436 

3 777 018 420 

2 736 825 054 

1109 

36 

5 

105 

55 

9 621428 465 

179 644 953 

7102 393 219 

5134 440 256 

1117 

36 

5 

105 

55 

8 930 755 992 

179 644 953 

6 762 813 242 

5 220 018 492 

1123 

36 

5 

105 

54 

8 686 858 952 

179 644 953 

6 567 794152 

5197 770153 

1147 

37 

20 

138 

122 

15 404 494 545 

349 149 710 

12 096 909112 

9 967 719 536 

1171 

37 

20 

138 

115 

12 240 930 101 

349 149 710 

9 688 750 293 

9 556 433 885 

core years for n G Sn: 

438 






the rational polynomials g' n from Table |TJ For various reasons these two n- values 
(though they share f[) were treated separately using the software from [T3] . 

For n = 1177 we used on the rational side smoothness bound 3 • 10 8 , cofactor 
bound 2 109 , and large factor bound 2 37 . On the algebraic side these numbers were 
5 • 10 8 , 2 74 , and 2 37 . Using large primes q G [3 • 10 8 , 3.51 • 10 s ] on the rational side 
(as opposed to the algebraic side above) we found 1 640 189 494 relations, of which 
1 606 180 461 remained after duplicate removal. With 1117 302 548 free relations 
this led to a total of 2 723 483 009 additional relations. With the 12 687801912 
relations found earlier, this resulted in 15 411 284 921 relations in total, involving 
15 926 778 561 prime ideals. Although this is not over-square (whereas the earlier 
relation set for n = 1177 from Section [4731 was over-square), the new free relations 
contained many singleton prime ideals, so that after singleton removal the matrix 
was easily over-square. The resulting filtered matrix was small enough. 

For n = 1199 the rational smoothness bound is 4 • 10 8 . All other parameters 
are the same as for n = 1177. After processing the rational large primes q G 
[4 • 10 8 , 6.85 • 10 s ] we had 6 133 381 386 degree 5 relations (of which 5 674 876 905 
unique) and 1117 302 548 free relations. This led to 17 392 336 790 relations with 
15 955 331 670 prime ideals and a small enough filtered matrix. 

The overall reduction in the resulting filtered matrix sizes was modest, and 
we doubt that this additional sieving experiment, though interesting, led to an 
overall reduction in runtime. On the other hand, spending a few months (thus 
a few hundred core years) on additional sieving hardly takes any human effort, 
whereas processing (larger) matrices is (more) cumbersome. Another reason is 
that we have resources available that cannot be used for matrix jobs. 
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4.5 Equipment Used 

Relation collection for n E Si was done from May 22, 2010, until February 21, 
2013, on clusters at EPFL as listed in Table [3) 82% on lacaLl and lacal_2, 12% 
on pleiades, 3% on greedy, and 1.5% on callisto and vega each, spending 3273 
(2367 + 906) core years. Furthermore, 65 and 327 core years were spent on lacaLl 
and lacal_2 for additional sieving for n = 1177 and n = 1199, respectively. Thus 
a total of 3665 core years was spent on relation collection for n E Si. 

Relation collection for n E Sn was done from February 21, 2013, until Septem- 
ber 11, 2014, on part of the XCG container cluster at Microsoft Research in 
Redmond, USA, and on clusters at EPFL: 46.5% on the XCG cluster, 45.5% 
on lacaLl and lacal_2, 5% on castor, 2% on grid, and 1% on greedy, spending a 
total of 2064 (1626 + 438) core years. It followed the approach described above 
for /i, except that data were transported on a regular 500 GB hard disk drive 
that was sent back and forth between Redmond and Lausanne via regular mail. 


Table 3. Description of available hardware. We have 100% access to the equipment 
at LACAL and to 134 nodes of the XCG container cluster (which contains many more 
nodes) and limited access to the other resources. A checkmark (/) indicates InfiniBand 
network. All nodes have 2 processors. 


location 

name 

processor nodes 

cores 

, cores GHz 
per node 

GB RAM per 
node core 

TB disk 

space 


' /bellatrix 

Sandy Bridge 424 

16 6784 2.2 

32 

2 



callisto 

Harpertown 128 

8 1024 3.0 

32 

4 


EPFL < 

castor 

Ivy Bridge 52 

16 832 2.6 

f 50: 64 
t 2:256 

4 

16 

22 


greedy 

~ 1000 mixed cores, ~ 1 

GB RAM per core; 70% windows, 

25% linux, 5% mac 


, vega 

Harpertown 24 

8 192 2.66 

16 

2 



' /lacaLl 

AMD 53 

12 636 2.2 

16 

It 

3 


LACAL < 

/lacal_2 

AMD 28 

24 672 1.9 

32 


pleiades 

Woodcrest 35 

4 140 2.66 

8 

2 



k storage server 

AMD 1 

24 24 1.9 

32 

if 

58 

Microsoft 

Research 

part of the XCG 
container cluster 

AMD 134 

8 1072 2.1 

32 

4 


Switzerland 

grid 

several clusters at 

several Swiss institutes 



5 Processing the Matrices 





Although relation collection could be shared among the numbers, 

the matrices 


must all be treated separately. Several of them required an effort that is con- 
siderably larger than the matrix effort reported in [18]. There a 192 795 550 x 
192 796 550-matrix with on average 144 non-zeros per column (in this section 
all sizes and weights refer to matrices after filtering) was processed on a wide 
variety of closely coupled clusters in France, Japan, and Switzerland, requiring 
four months wall time and a tenth of the computational effort of the relation 
collection. So far it was the largest binary matrix effort that we are aware of, in 
the public domain. The largest matrix done here is about 4.5 times harder. 
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5.1 The Block Wiedemann Algorithm 

Wiedemann’s Algorithm. Given a sparse r x r matrix M over the binary 
field F 2 and a binary r-dimensional vector v, we have to solve Mx = v (cf. 
Section E3J. The minimal polynomial F of M on the vector space spanned by 
{ M°v , M 1 v , M 2 v , . . .} has degree at most r. Denoting its coefficients by Fi G F 2 
and assuming that Fq = 1 we have F(M)v = F{M l v = 0, so that x follows 
as ^[=1 FiM l ~ 1 v. Wiedemann’s method [32] determines x in three steps. For 
any j with 1 < j < r the j-th coordinates of the vectors M l v for i = 0, 1, 2, . . . 
satisfy the linear recurrence relation given by the F{. Thus, once the first 2r + l of 
these j-th coordinates have been determined using 2 r iterations of matrix x vector 
multiplications (Step 1), the Fi can be computed using the Berlekamp-Massey 
method [25] (Step 2), where it may be necessary to compute the least common 
multiple of the results of a few j- values. The solution x then follows using another 
r matrix x vector multiplications (Step 3). 

Steps 1 and 3 run in time 0(rw(M )), where w(M ) denotes the number of non- 
zero entries of M. With Step 2 running in time 0(r 2 ) the effort of Wiedemann’s 
method is dominated by steps 1 and 3. 

Block Wiedemann. The efficiency of Wiedemann’s conceptually simple 
method is considerably enhanced by processing several different vectors v si- 
multaneously, as shown in 031]: on 64-bit machines, for instance, 64 binary 
vectors can be treated at the same time, at negligible loss compared to process- 
ing a single binary vector. Though this slightly complicates Step 2 and requires 
keeping the 64 first coordinates of each vector calculated per iteration in Step 1, 
it cuts the number of matrix x vector products in steps 1 and 3 by a factor of 64 
and effectively makes Wiedemann’s method 64 times faster. This blocking factor 
of 64 can, obviously, be replaced by 64 1 for any positive integer t. This calculation 
can be carried out by t independent threads (or on t independent clusters, ID), 
each processing 64 binary vectors at a time while keeping the 64£ first coordi- 
nates per multiplication in Step 1, and as long as the independent results of 
the t-fold parallelized first step are communicated to a central location for the 
Berlekamp-Massey step tlj. 

As explained in [8118] a further speed-up in Step 1 may be obtained by keep- 
ing, for some integer k > 1, the first 64 kt coordinates per iteration (for each 
of the t independent 64-bit wide threads). This reduces the number of Step 1 
iterations from 2 ^ to (^ + 1)^ while the number of Step 3 iterations remains 
unchanged at However, it has a negative effect on Step 2 with time and 
space complexities growing as (fc + l)^ -1 r 1+o( T and (& 1 ) 2 £r, respectively, 

for r 00 and with /a the matrix multiplication exponent (we used fi = 3). 

Double Matrix Product. In all previous work that we are aware of a single 
filtered matrix M is processed by the block Wiedemann method. This matrix M 
replaces the original matrix M raw consisting of the exponent vectors, and is 
calculated as M = M raw x M\ x M 2 for certain filtering matrices M\ and M 2 . 
For most matrices here, we adapted our filtering strategy, calculated M[ = 
Mr aw x Mi, and applied the block Wiedemann method to the r x r matrix M 
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without actually calculating it but by using M = M[ x M 2 . Because Mv can be 
calculated as M[(M 2 v) at (asymptotic) cost w(M 2 )+w(M[) this is advantageous 
if + w(M 2 )) is lower than the product of the dimension and weight 

resulting from traditional filtering. Details about the new filtering strategy will 
be provided once we have more experience with it. 

Error Detection and Recovery. See m for the “folklore” methods we used. 


5.2 Matrix Results 

All matrix calculations were done at EPFL on the clusters with InfiniBand net- 
work (lacaLl, lacal_2, and bellatrix) and the storage server (cf. Table [3]). Despite 
our limited access to bellatrix, it was our preferred cluster for steps 1 and 3 be- 
cause its larger memory bandwidth (compared to lacaLl and lacal_2) allowed us 
to optimally run on more cores at the same time while also cutting the number of 
core years by a factor of about two (compared to lacaLl). The matrix from [18], 
for instance, which would have required about 154 core years on lacaLl, would 
require less than 75 core years on bellatrix. 

Table [4] lists most data for all matrices we processed, or are processing. Jobs 
were usually run on a small number of nodes (running up to five matrices at 
the same time) , as that requires the least amount of communication and storage 
per matrix and minimizes the overall runtime. Extended wall times were and 
are of no concern. The Berlekamp-Massey step, for which there are no data in 
Table [4j was run on the storage server. Its runtime requirements varied from 
several days to two weeks, using just 8 of the 24 available cores, writing and 
reading intermediate results to and from disk to satisfy the considerable storage 
needs. For each of the numbers Step 2 thus took less than one core year. 


6 Factorizations 

For most n the matrix solutions were processed in the usual way [2612712] to find 
the unknown factors of 2 n — 1. This required an insignificant amount of runtime. 
The software from |2 is, however, not set up to deal with more fields than the 
field of rational numbers and a single algebraic number field defined by a single 
algebraic polynomial (in our case fi for n G Si and fu for n G Sn). Using this 
software for n G {1177, 1199}, the values for which additional sieving was done 
for the polynomials // and g' n from Table [TJ would have required a substantial 
amount of programming. To save ourselves this non- trivial effort we opted for 
the naive old-fashioned approach used for the very first SNFS factorizations as 
described in [23] Section 3] of finding explicit generators for all first degree prime 
ideals in both number fields Q({/2) and Q(Cn +C1} 1 ) an d up to the appropriate 
norms. Because both number fields have class number equal to one and the 
search for generators took, relatively speaking, an insignificant amount of time, 
this approach should have enabled us to quickly and conveniently deal with these 
two more complicated cases as well. 
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Table 4. Data about the matrices processed, as explained in Section l5J~1 with M{, M 2 , 
and M matrices of sizes r x r ' , r' x (r + £), and r x (r + £), respectively, for a relatively 
small positive integer 8. Runtimes in italics are estimates for data that were not kept 
and runtimes between parentheses are extrapolations based on work completed. Start- 
ing from Step 3 for n = 1151 a different configuration was used, possibly including some 
changes in our code, and the programs ran more efficiently. Until n = 1159 a blocking 
factor of 128 was used (so t must be even), for n E {1177, 1193, 1199} U Sn it was 64 
in order to fit on 16 nodes. The green bars indicate the periods that the matrices were 
processed, on the green scale at the top. The red bars indicate the matrices that are 
currently being processed. Dates are in the format yymmdd. 


[121207 . . . 

n r,r',5 or r, S (cf. above) weight(s) 


core years 


t 

k 

Step 1 

Step 3 

12 

3 

3.5 

2.6 

12 

2 

3.9 

2.6 

16 

2 

20.3 

13.5 

24 

2 

41.8 

30.6 

16 

2 

64.8 

44.4 

12 

2 

130.7 

38.3 

8 

2 

75.4 

43.3 

mm 

2 

87.0 

58.0 

4 

3 

89.3 

74.1 

6 

wm 

129.5 

105.3 

6 

3 

104.8 

(86.0) 

n E Sv 

751.0 + 498.7 

8 

3 

13.4 

10.1 

8 

3 

20.3 

(16.7) 

6 

3 

(25.5) 

(20.9) 

4 

3 

(29.4) 

(24.1) 


. . . 140914 ] 
start - end 


1007 

1009 

1081 

1111 

1129 

1151 

1153 

1159 

1177 

1193 

1199 

1093 

1109 

1117 

1123 


{ 

/ r = 
I r' - 


r — 
r' 
r = 


38 986 666 
61 476 801, . 

39 947 548 
64 737 522, 
79 452 919 


5 = 


5 = 


= 122 320 052, 
r r= 108 305 368 
\r' = 167 428 008, 

| r = 132 037 278 
\ r' =204 248 960, 5 = 
f r = 164 438 818 
\r' =253 751 725, 5 = 
f r = 168 943 024 
l r' =260 332 296, 5 = 
f r = 179 461 813 
l r' =276 906 625, 5 = 
f r = 192 693 549 
\r' = 297 621 101, 5 = 
r = 297 605 781, <5 


420 

348 

1624 

1018 

341 

911 

1830 

1278 

1043 
= 1024 


201.089r 
31.518r' 
202.077r 
36.958r' 
183.296r 
15.332r / 
180.444r 
13.887r' 
180.523r 
13.434r / 
174.348r 
11.810r' 
169.419r 
11.014r / 
174. 179r 
li.essr' 
216.442r 
19.457r' 
272.267r 


r = 270 058 949, <5 = 1064 217.638r 


f r= 90 140 482 
l r' = 138 965 105, . 
f r = 106 999 725 
l r' = 164 731867, , 
f r = 117501 821 
l r' = 182 813 008, . 
| r = 124 181 748 


192 010 818, 5 = 3225 


204. 151r 
16.395r' 
216.240r 
15.976r' 
202.310r 
15.638r' 
197.677r 
14.222r' 


121207 - 130106 
(30 days) 
130424 - 130610 
(47 days) 
130130 - 130311 
(41 days) 
130109 - 130611 
(154 days) 
121231 - 130918 
(262 days) 
130316 - 131210 
(270 days) 
130326 - 131026 
(215 days) 
130808 - 140207 
(184 days) 
140119 - 140525 
(127 days) 
131029 - 140819 
(295 days) 
started 14062# . 

= 1249.7 

140731 - 140912 
(44 days) 
started 140801 
(> 45 days) 
started 140805 
(> 41 days) 
started 140819 
(> 27 daysT™ 


For n = 1177, however, we ran into an unexpected glitch: the 244 congru- 
ences that were produced by the 256 matrix solutions (after dealing with small 
primes and units) were not correct modular identities involving squares of ratio- 
nal primes and first degree prime ideal generators. This means that the matrix 
step failed and produced incorrect solutions, or that incorrect columns (i.e., 
not corresponding to relations) were included in the matrix. Further inspection 
learned that the latter was the case. It turned out that due to a buggy adap- 
tation to the dual number field case incorrect “relations” containing unfactored 
composites (due to the speed requirements unavoidably produced by sieving and 
cofactorization) were used as input to the filtering step. When we started count- 
ing the number of bad inputs, extrapolation of early counts suggested quite a 
few more than 244 bad entries, implying the possibility that the matrix step 
had to be redone because the 244 incorrect congruences may not suffice to pro- 
duce correct congruences (combining incorrect congruences to remove the bad 
entries). We narrowly escaped because, due to circumstances beyond anyone’s 
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control [29] . the count unexpectedly slowed down and only 189 bad entries were 
found. This then led to a total of 195 correct congruences, after which the fac- 
torization followed using the approach described above. 

The factorizations that we obtained so far, ten for rz E Si followed by a single 
one for n E Sn, are listed below: n, the lengths in binary and decimal of the 
unfactored part of 2 n — 1, factorization date, the lengths of the smallest newly 
found prime factor, and the factor. 

1007 : 843-bit c254, Jan 8 2013, 325-bit p98: 

f 4566483352305262858649521337144251174007537195118247844881978589475276 
\ 3553620148815526546415896369 
1009 : 677-bit c204, Jun 12 2013, 295-bit p8 9: 

f 3280162939931622038625593856607754107883623834586834118156725600815563 
\ 8984594836583203447 

1081 : 833-bit c251, Mar 11 2013, 380-bit pll5: 

f 1439581090232360306724652721497221475801893594104335706767629109277502 
\ 599083325989958974577353063372266168702537641 
1111 : 921-bit c278, Jun 13 2013, 432-bit pl30: 

f 9401699217426101126085627400537881688668923430306029902665947240112085 
\ 572850557654128039535064932539432952669653208185411260693457 
1129 : 1085-bit c327, Sep 20 2013, 460-bit pl39: 

f 2682863551849463941555012235061302606113919542117141814168219065469741 
| 026973149811937861249380857772014308434017285472953428756120546822911 
1151 : 803-bit c242, Dec 12 2013, 342-bit pl03: 

f 8311919431039560964291634917977812765997001516444732136271000611174775 
| 264337926657343369109100663804047 
1153 : 1099-bit c331, Oct 28 2013, 293-bit p89: 

( 1012236096124787395362419088517888862960688998043517924968352429331323 
| 0115056983720103793 
1159 : 1026-bit c309, Feb 9 2014, 315-bit p95: 

f 6299926503608233590011196470146200043859293251781566081845188191562115 
\ 4349210038027033309344287 
1177 : 847-bit c255, May 29 2014, 370-bit pll2: 

f 2015660787548923454662590205621123886970085761436021592942859847523108 
\ 465523348455927947279783179798610711213193 
1193 : 1177-bit c355, Aug 22 2014, 346-bit pl04: 

f 8522732620131436182389377660543363667021742538831190645771440901604996 
\ 1507516230416822145599757462472729 

1093 : 976-bit c294, Sep 13 2014, 405-bit pl22: 

f 4611633294343645255154057631569698529799025986941131181322230323104719 
| 6444160418969946791520558378694863913363980328293449 

The total cost for the eleven factorizations for n E Si will be about 4915 
core years, with relation collection estimated at 3665 core years, and all matri- 
ces in about 1250 core years. Relation collection for n E Su required 2064 core 
years, and three of the five remaining matrices are currently being processed. 
Individual factorization using the SNFS would have cost ten to fifteen thousand 
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core years for all n G S\ and four to six thousand core years for all n G Sn, 
so overall we expect a worthwhile saving. The completion date of the overall 
project depends on the resources that will be available to process the matrices. 
The online version [19] of this paper will be kept up-to-date with our progress. 

With so far a smallest newly found factor of 89 decimal digits and a largest 
factor found using the ECM of 83 decimal digits |33] , it may be argued that our 
ECM preprocessing did not “miss” anything yet. 

7 Conclusion 

We have shown that given a list of properly chosen special numbers their fac- 
torizations may be obtained using Coppersmith’s factoring factory with consid- 
erable savings, in comparison to treating the numbers individually. Application 
of Coppersmith’s idea to general numbers looks less straightforward. Taking the 
effects into account of rational versus algebraic precomputation (giving rise to 
many more smooth values) and of our relatively large algebraic degree (lower- 
ing our number of precomputed values), extrapolation of the 70 TB disk space 
estimate given at the end of Section 14.21 suggests that an EB of disk space may 
be required if a set S of 1024-bit RSA moduli to be factored is not known in 
advance. This is not infeasible, but not yet within reach of an academic effort. Of 
course, these excessive storage problems vanish if S is known in advance. But the 
relative efficiency of current implementations of sieving compared to factoriza- 
tion trees suggests that \S\ individual NFS efforts will outperform Coppersmith’s 
factorization factory, unless the moduli get larger. This is compounded by the 
effect of advantageously chosen individual roots, versus a single shared root. 

Regarding the SNFS factorization factory applied to Mersenne numbers, the 
length of an interval of n - values for which a certain fixed degree larger than our 
d = 8 is optimal, will be larger than our interval of n- values. And, as the corre- 
sponding Mersenne numbers 2 n — 1 will be larger than the ones here, fewer will 
be factored by the ECM. Thus, we expect that future table-makers, who may 
wish to factor larger Mersenne numbers, can profit from the approach described 
in this paper to a larger extent than we have been able to - unless of course 
better factorization methods or devices have emerged. Obviously, the SNFS fac- 
torization factory can be applied to other Cunningham numbers, or Fibonacci 
numbers, or yet other special numbers. We do not elaborate. 
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Abstract. In this paper, we revisit the recent small characteristic dis- 
crete logarithm algorithms. We show that a simplified description of the 
algorithm, together with some additional ideas, permits to obtain an im- 
proved complexity for the polynomial time precomputation that arises 
during the discrete logarithm computation. With our new improvements, 
this is reduced to 0(q 6 ), where q is the cardinality of the basefield we are 
considering. This should be compared to the best currently documented 
complexity for this part, namely 0(q 7 ). With our simplified setting, the 
complexity of the precomputation in the general case becomes similar to 
the complexity known for Kummer (or twisted Kummer) extensions. 

1 Introduction 

Recently, the computation of discrete logarithms in small characteristic finite 
fields has been greatly improved |Joul4IGGMZ13alBGJT14| , with the introduc- 
tion of a new family of Index Calculus algorithms for this case. In the sequel, 
we call the algorithms from this family: Frobenius Representation algo- 
rithms. Frobenius Representation algorithms can be seen as descendants of the 
pinpointing algorithm introduced in lJou!3a| . The first two Frobenius Represen- 
tation algorithms appeared essentially simultaneously, one of them proposed by 
Joux in jJou!4j was first used in a discrete logarithm record in F 2 i778 announced 
on Feb 11 ^ 2013 on the NMBRTHRY mailing list, while the first draft of the 
article describing the L( 1/4) complexity analysis of the algorithm was posted 
as |Jonl3b] on Feb 20^ 2013. Between these two events, another Frobenius Rep- 
resentation algorithm with complexity L(l/3) was proposed in [GGMZ13b] with 
a record in F 2 i97i announced on Feb 19^ 2013 on the same mailing list. From 
an asymptotic point of view, the best current Frobenius Representation algo- 
rithm is the quasi-polynomial time algorithm proposed in (BGJT14] . In prac- 
tice, a lot of options are open depending on the exact finite field we want to 
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address. However, there are currently many open questions about these algo- 
rithms. From a theoretical point of view, it would be extremely nice to remove 
the heuristic hypotheses that are used in the algorithms. A first step in this 
direction is proposed in }GKZ14b| , with a simplified individual logarithms algo- 
rithm that only relies on the ability to descent finite field elements expressed by 
polynomials of even degree 2D to polynomials of degree D. Another theoretical 
question would be to get the complexity down to polynomial time instead of 
quasi-polynomial. From a practical point of view, the limiting step for setting 
records in the general case, as opposed to special cases such as Kummer ex- 
tension, is usually the computation of the logarithm of the initial factor base 
elements. When working over a base field F g , the best documented complex- 
ity is 0(q 7 ) (see for example |AMORH14| ). However, some authors mention 
an higher complexity, typically, for the computation performed in |GKZ14a| . 
with q - 2 6 , the authors explain that the dimension of the linear algebra is re- 
duced from q 4 to g 4 / 24 = g 4 /log 2 (g 4 ). Asymptotically, with this approach the 
complexity would be 0(q 9 / log(g) 2 ) . For specific cases such as Kummer exten- 
sion, the complexity is lower of the order of 0(q 6 ). 

In this paper, we give a new variation which achieves complexity 0(q 6 ) for the 
general case. Part of this work was already presented by the first-named author 
in several presentations during the development of our algorithm. It is presented 
here in writing for the first time. In these earlier talks, the variation was described 
as a simplified version with degraded performance, the main reason being that 
using polynomials of degree up to D over ¥ q seems essentially equivalent to 
using linear polynomials over ¥ q d, with d = D. However, instead of allowing us 
to compute logarithms in the field F^d/e with k of the same order of magnitude 
as g, it only leads to logarithms in ¥ q k and we lose the extra factor of d in the 
field exponent, which came for free with the standard approach (with a value 
of d usually between 2 and 4). Also note that a similar correspondance between 
low degree polynomials over a large field and higher degree polynomials over a 
smaller field also appears in iGKZllbl . 

In order to make the algorithm efficient, D needs to be minimized. At first 
glance, it seems that we need to take at least D - 3 to bootstrap the computa- 
tion. Our main contribution is that with this simplified approach, it is in fact 
possible, under a reasonable heuristic assumption, to reduce the degree of the 
polynomials in the initial factor base over ¥ q to D = 2. Once the initial factor 
base is computed, with a cost 0(g 5 ), we use it as a lever to obtain the logarithms 
of polynomials of degree D - 3 and D - 4 with a total cost 0(q 6 ). Using either 
the heuristic quasi-polynomial descent of |BGJT14j or the alternative version 
from |GKZ14b] . it is possible to bring down arbitrary elements to ¥ q k to this 
extended factor base formed of irreducible polynomials up to degree 4. 

Outline of the Article. As any recent discrete logarithms algorithms for small 
characteristic finite fields, our simplified setting has several phases: 

> The Preliminary phase, that finds a representation of the target finite field. 

> The Relation Collection and Linear Algebra phases, that permit to recover 
the discrete logarithms of a small set of elements, the factor base. 
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> The Extension phase, specific to small characteristic finite fields, in which 
we obtain the discrete logarithms of a larger set containing the factor base. 
We call this new set the extended factor base. 

> The Descent phase, that recovers the discrete logarithm of an arbitrary 
element of the finite field by rewriting it as products of elements of the 
extended factor base. 

Following this common structure we introduce our simplified setting in Sec- 
tion O We present then in Section [3] the computation of the discrete logarithms 
of the factor base together with the Extension phase. Section [4] gives a short 
analysis of the total improved asymptotic complexity obtained. Finally, in Sec- 
tion [5j we illustrate the efficiency of the algorithm with a practical computation 
of discrete logarithms in the general case of a prime extension degree which does 
not divid^U q(q + l)(q - 1). More precisely, we perform the computation of the 
logarithms in F q k with q- 3 5 and extension degree k - 479 (the largest prime 
smaller than 2 q). 

2 Simplified Setting for Small Characteristic Finite Fields 

When trying to compute discrete logarithms in a given finite field, let us say ¥ q k , 
the first step is to choose a convenient way to construct it. We first expose in 
Section EH how Frobenius Representation algorithms represent the target field 
with the help of two polynomials ho and hi. We present then an improved way 
to choose these two cornerstone polynomials in Section lT2l Last but not least, 
we propose in Section 12.31 a simpler factor base. It is the combination of these 
two simplified choices that permits to obtain an improvement in the asymptotic 
complexity of the Relation Collection, Linear Algebra and Extension phases. 

2.1 Frobenius Representation Algorithms 

Like all Frobenius Representation algorithms, the algorithm we propose relies 
on two key elements. The first element is the well-known fact that over F g [X], 
the following polynomial identity holds: 


[1 (X-a) =X q -X. 


(i) 


a€W q 


The second element is to define the target finite field ¥ q k , where we want to com- 
pute discrete logarithms, by determining two polynomials ho and hi of degree at 
most H and by requiring that there exists a monic irreducible polynomial I(X ) 
of degree k over F g [X] such that: 


I(X) divides h 1 (X)X q -h 0 (X). 


( 2 ) 


1 The known special cases which are very efficient for record being Kummer extensions 
of degree dividing q - 1, twisted Kummer extensions with degree dividing q + 1 and 
Artin-Schreier extensions. 
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If 6 denotes a root of I(X ) in ¥ q , setting ¥ q k = ¥ q [X]/(I(X)) = ¥ q (6) gives a 
representation of the finite field that satisfies 6 q = ho(0)/hi(0). Since the map 
that raises an element of ¥ q to the power q is called the Frobenius map, this 
choice of representation explains the name of Frobenius Representation we 
use for this family of algorithms. 

The Dual Frobenius Representation Variant. There is an alternative op- 
tion proposed in |GKZ14aj for constructing the extension field where we require 
that: 

I(X ) divides hi(X q )X -h 0 (X q ). 

The advantage of this option is to allow a wider range of possible extension de- 
grees k for a given basefield ¥ q . However, using this variation slightly complicates 
the description of the algorithm. With this variation, the finite field represen- 
tation satisfies 0 = ho(0 q )/hi(0 q ). When referring to the variation by name, we 
will call it a dual Frobenius Representation or equivalently a Verschiebung 
Representation. 

2.2 Improved Choice of ho and hi 

A Really Simple Construction. We recall that the usual choice is to take two 
quadratic polynomials to allows the possibility of representing, at least heuristi- 
cally, a large range of finite fields. Since we know that using linear polynomials 
for ho and hi does not allow such a large range, we propose a slightly different 
choice. We take for ho an affine polynomial and for h\ a quadratic poly- 
nomial. We assume furthermore that the constant term of h\ is equal to 0. Note 
that, by factoring out a constant in the defining Equation © , we can assume, 
without loss of generality, that h\ is monic. For simplicity of the presentation, 
it is convenient to rewrite: 

h 0 (X) = rX + s and h 1 (X) = X(X + t) (3) 

A Useful Variant. Another natural option is to take for ho a quadratic poly- 
nomial with a contant term equal to 0 and for h\ an affine polynomial. In this 
case, it is convenient to rewrite: 

h 0 (X) = X(X + w) and h 1 (X)=uX + v. (4) 

At first sight, nothing indicates that one of the two choices is better, and in fact, 
both are equivalent in term of complexity. However, as we show in Section [3j 
the first one leads in practice to a simpler description of the algorithm. As a 
mnemonic we can notice that (r,s,£) are the coefficients of the really simple 
construction whereas (u, v,re) are the one of the useful variant. 

2.3 Seeking a Natural Factor Base 

Once the representation of the target field is chosen, we need to fix the factor 
base. With the aim of simplifying the description of the algorithm, we propose 
to get rid of polynomials with coefficients in an extension field. 
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Irreducible Polynomials with Coefficients in the Basefield. We choose a 
parameter D and consider a factor base that contains all irreducible polynomials 
of degree < D over ¥ q [X]. This has to be compared with previous Frobenius 
Representation algorithms that consider irreducible polynomials with coefficients 
in an extension of ¥ q . To generate equations, we let A and B be two polynomials 
of degree < D and using Equations © and © we write: 

B(6) n (A(9)-aB(9)) = B(0)A{6) q - A(0)B(0) q 

= B{0)A(6 q ) - A(6)B(6 q ) 


For compactness, we match B{6) with the point a at infinity on the projective 
line Pi(Fg). This permits to rewrite throughout the sequel the first product as 
ria€Pi(Fg)(^(0) ~olB(Q)). We also introduce the following notation: 

Definition 1. Let D be an integer, and ho, hi, A, B be four polynomials such 
that A and B are of degree at most D. Then \A,B] D is called the D-bracket of 
A and B. It is defined as: 

[A,B] d (X) - M*)” (*(*>-*(£$) - «*)* (Mfi)) . 

Proposition 1. If ho andhi are polynomials of degree at most H and if A and B 
are polynomials of degree at most D then: 

> \A,B] d is a polynomial of degree at most ( H + 1) • D . 

> The map [.,.] D is bilinear and antisymmetric. In particular, [A, A\ D - 0. 

The proof of the two items of the proposition is straightforward. With these 
two notations, we rewrite the equality as: 


II ( A(d)-aB(9)) = 

aeF\(W q ) 


[A,B] d (9) 

hi(9) D 


(5) 


Since the numerator \A,B] D of the right-hand side of Equation ([5j) has a 
bounded degree, under a classical heuristic, the probability that it factors into 
irreducible polynomials of degree at most D can be lower bounded by a con- 
stant ph> When using a dual Frobenius Representation, we similarly get: 


n (m-<*B{9)) 

ae Pi(Fg) 


f [A,B] D (9) \ q 

\ hyer ) ' 


(6) 


Degree of the Factor Base Polynomials. In order to choose the parameter 
D , we have to balance three ideas: to lower the complexity of the linear algebra 
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phase we require to have a small factor base, but, we also need to be able to gen- 
erate enough good equation^ and to descent larger polynomials to polynomials 
of the factor base. The polynomial degree of the factor base must not be too 
small in both cases, otherwise one at least of this two steps will not be possible. 
Let us give more details about this degree. 

The previous degree 3 barrier. When we consider the general case where ho 
and h\ are polynomials of degree bounded by iL, the analysis is as follows. The 
number of equations that can be generated is obtained by counting the number 
of pairs of polynomials (A,B) that remains once we take into account the fact 
that the pairs are invariant under the action of PGL 2 (F g ). In other words, ignor- 
ing the cases where the degree is somehow reduced (see Appendix [A] for details) 
in the left-hand side of Equation ([5j) we can assume that: 

A(X)=X D + a(X ) and B(X) = X D ~ x + 6(A), 

where a(X) and b(X) have degree at most D - 2. As a consequence, since poly- 
nomials of degree D - 2 have D - 1 coefficients, the number of good equations 
that can be generated in this manner is of the order of pn • q 2D ~ 2 * * . Moreover, 
the number of elements in the factor base, i.e. the number of irreducible of de- 
gree at most D is close to q D /D. To get more equations than unknowns in the 
linear algebra phase, i.e. to obtain D • pn • q D ~ 2 ^ 1, unless enlarging a lot the 
probability p#, we need that D ^ 3, as underlined in |GKZ14b] . 

As a consequence, the best hope we get for the complexity of computing 
the logarithms of factor base elements is of the order of ( q D ) 2 • q > q 7 . Note 
that looking at the various existing record, this lower bound of q 7 is not always 
attained, since some computations need to enlarge the factor base to D = 4, which 
raises the complexity to 0(q 9 ). Typically, such an enlargement is performed 
in }GKZ14a] . even if, thanks to a judicious use of Galois invariance, they reduce 
the cost of this enlargement compared to 0(q 9 ) by regrouping the degree 4 
object^] into groups of 24 conjugates. 

The reason for this enlargement is that the known techniques for descending 
polynomials of degree larger than 4 to degree 4 do not work completely to de- 
scent degree 4 polynomials to degree 3, since in most cases, only a fraction of 
degree 4 irreducible polynomials can be obtained in this manner. This is similar 
to the situation reported in [AMORH14] , where half of the quadratic polynomi- 
als over a cubic extension can be derived with the descent algorithm from linear 
polynomials. 

Breaking the barrier. Following the above argument, for D = 2 we expect about 
q 2 1 2 irreducible polynomials and assuming that H = 2, one would expect a value 
of ph well below 1/2. Thus, without any improvement on the probability, the 
expected number of equations is too small compared to the number of unknowns 
and it is not possible to derive the discrete logarithms of the small elements in 
this manner... Yet, in our simplified setting the factor base consists in all 

2 We call good equations equations of the restricted form m where both right and 

left-hand side can be written with polynomials of the factor base only. 

Those objects are in fact quadratic polynomials over a degree 2 extension. 
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the irreducible polynomial of degree 2 with coefficients in the base 
field. We explain in Section [3711 how to get around this problem and to recover 
all the discrete logarithms of the factor base. 

3 Improving Computations of the (Extended) Factor Base 

In this section, we present two contributions which allow us to reduce the global 
cost of the polynomial part of discrete logarithm computations. The first con- 
tribution in Section 13.11 describes how we can adapt the use of Equation ([5j) 
to be able to perform an initial computation with a reduced initial factor base 
corresponding to D = 2 for a cost 0(q 5 ). We also show, in Section lT2l that once 
this is done, the enlargement to D = 3 can be performed with a reduced cost 
0(g 6 ), instead of the expected 0(q 7 ). 

The second contribution presented in Section 13.31 is a new descent technique 
that only requires a small subset of degree 4 irreducible polynomials to be able to 
compute on the fly the logarithm of an overwhelming fraction of other degree 4 
polynomials. If there is enough available memory, it is also possible using a adap- 
tation of this technique to obtain the logarithms corresponding to an enlarged 
basis with D - 4. Both options can be performed with a time complexity 0(q 6 ). 


3.1 A Reduced Degree 2 Factor Base 

As previously said, if we choose a degree 2 factor base, it seems that we don’t 
have enough good equations compared to the number of unknowns. We pro- 
pose two approaches to get rid of this problem. First, we show that thanks to 
our smaller degree polynomials ho and hi, we can improve p#, the bound on 
the probability to obtain a good equation, by exhibiting systematic factors. In 
addition, we also use another source of equations to complete the system. A 
secondary advantage is that this second source leads to much sparser equations 
that the use of Equation ([5j) . 

Improving the Probability pn Thanks to Systematic Factors. Once we 
have fixed A(X) = X D + a(X) and B(X) = X D ~ x + 6(X), we see that both the 
left-hand side and the denominator of the right-hand side of Equation ([5j) or @ 
can be written as products of elements of the factor base. So, we have to analysis 
the probability that the numerator of the right-hand side, namely the D-bracket 
of A and B, can be factorized in products of polynomials of degree at most 2. 

The simple construction : ho affine and h\ quadratic. Proposition [l] allows to 
upper-bound the degree of \A,B] D by (H + 1) • D. As a consequence, for H = 2 
and D = 2, this degree is lower than 6. The probability that a random polynomial 
of degree 6 factors into terms of degree less than 2 is well too small to permits to 
obtain enough equations. Though, as mentioned in [GKZ14b] . we remark that a 
systematic term appears in the factorization of [A, B] D (X). To be more precise, 
we have the following result: 
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Lemma 1 (Systematic factor of a D-bracket). Let A and B be two poly- 
nomials of degree at most D. Then [A,B] D (X) is divisible by Xh\(X) - ho(X). 

Proof. By bilinearity, if A(X) = Y^=o a iX l and B(X ) = Y^Lo biX\ we can write: 
[A,B] D = Y^o Y,f=o a ibj[X l ,X^ D . Moreover, since [.,.]d is bilinear and anti- 
symmetric it is clear that [X*,X J ] D = ~[Xf X l ] D and [X 2 ,X Z ] D = 0. Thus, it 
suffices to consider the D-bracket of X 1 and X J where i < j. Lets us compute: 

[x\x j ] D = h?- j (x) (x^hoixyh^xy^-x^ixy) 

= h^- j (x)x i h 0 (x) i {{xh^x))^ - h 0 (xy -*) 

= h^{x)x i h 0 {x)\xh l {x)-h 0 {x))f j h 0 {x) k -\xh l {x)y- i - k . 

k = 1 

As a consequence Xhi(X)-ho(X) divides [X l ,X J '] and the lemma follows. □ 

Thus, after dividing [A,B] D by this degree 3 systematic factor, the question 
is whereas a polynomial of degree 3 factors into terms of degree at most 2. 
Assuming that it behaves as a random polynomial in this respect, we can lower 
bound (see Appendix IB]) the probability by 2/3. Since this is higher than 1/2, 
we have now enough equations to compute the logarithms of the factor base. 

The useful variant: ho quadratic and h\ affine. We can check again that the 
numerator in the right-hand side of Equation ([5]) or © becomes systematically 
divisible by 9h\(6)-ho(0). Yet, in this variant, this systematic factor has degree 2 
only. This partially improves the value of p#, however, this is not sufficient to 
get enough equations. 

To go further in reducing this degree, we have to remark that the bound on 
the degree of [A,B] D given in Proposition [lj which is ( H + 1) • D, can in fact 
be improved in the specific case where hi is affine. In truth, the degree is now 
upper-bounded by (H + 1) • D -1. For H - 2 and D - 2, this reduces for free the 
degree from 6 to 5. As a consequence, after dividing by the degree 2 systematic 
factor of Lemma [TJ there remains as previously a polynomial of degree 3. Again 
the probability pn is lower-bounded by2/3>l/2. In both cases, this probability 
would already suffice to produce enough equations. 

Additional Equations. Despite the fact that the equations obtained with our 
improved choice of ho and hi in both the simple construction and the useful 
variant would suffice to solve the linear system with parameter D = 2, proposing 
a source of extra equations is also helpful. In this section, to produce additional 
equations, we simply consider a variation on the systematic equations that were 
introduced in |BMV85I and often used in the Function Field Sieve. 

More precisely, let /(X) = X 2 + fi X + fo be an irreducible polynomial of 
degree 2 in ¥ q [X] . We can write : 

,(i<oW\ M0) a + /iM0)M*WoM0) a 
m - f \hmj- Siw ■ 
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The numerator of the right-hand side is a polynomial of degree 4, since one of 
the two polynomials ho or hi is quadratic and the other one is affine. We remark 
that about half of these numerators are irreducible and the other half factor 
into a product of two degree 2 irreducible polynomials. For the case of a dual 
Frobenius Representation, the systematic equations are slightly different: 




(9) 2 + /ifto(9)ki(9) + /oki(9) 


2 \Q 


hi(h ) 2 


but the principle remains identical. These systematic equations can easily be 
generalized to irreducible polynomials of arbitrary degree, with again a close to 
half/half repartitiorfl: 


Lemma 2. Let ho and h\ be two polynomials such that one is affine and the 
other quadratic. If f is a degree D monic irreducible polynomial in F q [X], then 
hi(X) 2D f(ho(X)/hi(X)) is a polynomial of degree 2D that has a probability 
equal to 1 - p to be irreducible and a probability equal to p to factor into two 
degree D irreducible polynomials , with: 

1 (q D - 1 q\- D M +1 -q\ q D + 3 

In particular, note that for irreducible polynomials of degree 1, which are 
part of the initial factor base for D = 2, we always obtain a systematic equation 
relating the given polynomial either to two other affine polynomials or to one 
quadratic polynomial. Note that we could also use the systematic equations for 
higher degree polynomials in Section lT3l to ease the computation of the logarithm 
of degree 4 polynomials. 


3.2 Enlarging the Factor Base to Degree 3 

In order to be able to enlarge the factor base to degree 3 without performing 
linear algebra on a matrix of dimension q 3 , we follow an approach quite similar 
to the one presented in |Joul4| . Namely, we divide first the set of irreducible 
polynomials of degree 3 into groups and search then for a way to generate enough 
equations involving only the polynomials within a group and polynomials of 
degree 1 or 2 whose logarithms are already known. 


Groups of Degree 3 Polynomials for the Simple Construction. To define 
a group of degree 3 polynomials we start from an element g in the base field F q 
and we consider V g the corresponding group of degree 3 polynomials such that: 

V g = {(X 3 + g) + ctX 2 +/3X\ (a, /?) e F q 2 }. 

Clearly, if we generate a relation using Equation ([5J , or (j6]), with A(X) = ( X 3 + 
g) + aX 2 and B(X) = ( X 3 + g) + /3 X , with a and b in F g , then all degree 3 
polynomials that appear in the left-hand side belong to V g . The elements of V g 
can be divided into two groups: 

4 We prove the following lemma in the extended version of this article. 
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> the reducible polynomials whose logarithms can be computed by taking the 
sum of the logarithms of their factors, 

> and the irreducible polynomials which appear as unknowns. Note that the 
number of irreducible polynomials in a group V g is approximately q 2 / 3. 

For one fixed element g , by considering all possibilities for a and /3, we find q 2 can- 
didate relations. Yet, we keep only those whose right-hand side factors into terms 
of degree at most 2. The question is now whether we obtain enough equations 
to be able to solve the corresponding linear system. 

For this, lets look into more details at the right-hand side. With our choice of 
ho and h\ it is a polynomial of degree 9, as described in Proposition [lj Moreover, 
it follows from Lemma [T] that it is divisible by the degree 3 polynomial Ohi(O) - 
ho(0). As a consequence, we are left with a polynomial of degree 6 to factor 
in terms of degree at most 2. The probability to obtain a good relation is not 
yet higher than 1/3. To improve on this probability, we first remark that with 
our specific choice of A and B the polynomial degree of the numerator of the 
right-hand side is in fact 8. Thus we are left with a polynomial of degree 5 to 
factor in terms of degree at most 2. Besides, we reveal a very simple systematic 
factor. 

Lemma 3 (Systematic factor of particular 3-brackets in the simple 
construction). Let ho, hi, A and B be four polynomials such that ho is affine, 
h x (X) = X(X + t), A(X) = (X 3 + g) + oX 2 and B(X) = (X 3 + g) + pX, with 
t , g , a and (3 in ¥ q . Then \A,B] S is a polynomial of degree at most 8 divisible 
byX. 

Proof By bilinearity and antisymmetry we have [A,B] S = <a[X 2 ,X 3 +g] 3 + 
/ 3[X 3 + g,X] 3 + o/ 3[X 2 ,X] 3 . Let us compute the following 3-brackets: 

[X,X 2 ] 3 = X 2 h ohl-Xhlhx 
[X 3 + g, X] 3 = X(h 3 + gh\) - (X 3 + g) h 0 hj 

= X[h 3 + gh\ - (X 3 + g) h 0 X(X + t ) 2 ] 

[X 3 + g , X 2 ] 3 = X 2 (h 3 + gh\ ) - (X 3 + g) h 2 h, 

= X[X(h 3 + gh\ ) - (X 3 + g) hi (X + 1)] 

The result of the lemma comes from the fact that all the 3-brackets involved 
in the computation of \A,B] S are divisible by X. Moreover, considering the 
polynomials degrees of these elements we remark that [X, X 2 ] 3 has degree 6 
whereas [X 3 + $,X] 3 has degree 7 and [X 3 + g,X 2 ] 3 has degree 8. □ 

As a direct consequence, the remaining factor in the right-hand side when 
considering these groups is of degree 4. According to Appendix [B] the heuristic 
probability that it factors into terms of degree at most 2 is close to 41%. Since 
these is greater than 1/3, we expect to find enough equations to compute all the 
discrete logarithms of the irreducible polynomials belonging to P g . Moreover, it 
is clear that any monic and irreducible polynomial of degree 3 belongs to one V g . 
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Groups of Degree 3 Polynomials for the Useful Variant. In this setting, 
computing discrete logarithms of degree 3 polynomials is a bit more tricky. To 
define a group in this case, we start from a triple (gi,# 2 , # 3 ) of elements in ¥ q . 
The corresponding group of degree 3 polynomials is defined as: 

v gua2 , g 3 = {X 2 (x - 9l ) +aX{X- g 2 ) + /3 (X - g 3 )\(a,(3) e F g 2 }. 

Let us fix (# 1 ,^ 2 , # 3 ) £ F g 3 . If we generate a relation using Equation ([5j) with 
A(X) = X 2 (X- gi )+a (X-g s ) and B(X) = X(X-g 2 ) + f3 (X-g s ), with a and (3 
in F q , then all degree 3 polynomials that appear in the left-hand side belong to 
the corresponding group V gi , g2: g 3 . After keeping only the q 2 candidate relations 
whose right-hand side factors into terms of degree at most 2, the question is, 
again, whether we obtain enough equations to solve the linear system where the 
unknown are the q 2 /3 irreducible polynomials of ' P gi , g2 , g3 • 

When ho is quadratic and h\ affine, the right-hand side is still a polynomial 
of degree 8 divisible by Ohi(O) -ho(0). We are left with a polynomial of degree 6 
to factor in terms of degree at most 2. Yet, without any further improvement, 
the probability of this remaining polynomial to factor into terms of degree at 
most 2 is still too small to obtain enough equations. 

To overcome this obstacle, we no longer consider the general groups of this 
form. Our goal is to point out some groups in which we now that the right- 
hand sides have some extra systematic factors. Another argument for considering 
few special groups only comes when we remark that the number of degree 3 
polynomials produced with all those general groups is way too large. Taking all q 3 
groups of the form 'P gi , 92 , g3 is a clear overkill since they each contain q 2 elements 
whereas there are only q 3 monic polynomials of degree 3. In fact we expect that 
these polynomials could be mostly covered by q groups only. To put it in a 
nutshell, we restrict ourselves to the specific choice of < 71 , g 2 and g% where we 
first choose a value g\ e F g and compute then: 

92=G{gi) and g 3 = G(g 2 ) 

where G : F q F g is a particular map. We propose to consider: 

G-.g~ ^ + W) . (7) 

y (1 + u){y + w-g) K } 

We recall that u,v,w denote the coefficients of the polynomials ho and hi, as 
given in 0 . Assuming that both gi and g 2 are not equal to v + w then all 
three values (# 1 ,^ 2 , # 3 ) are well-defined. With this specific choice, the right-hand 
side that now appear in Equation ([5j) or gains a new systematic degree 2 
factor Ohi + ho + (v + w)h\ = (1 + u) 6 2 + (1 + u)(v + w)0 + vw + v 2 as given in 
Lemma [4j Again, the remaining factor in the right-hand side when considering 
these groups is of degree 4. Since the probability of a degree 4 polynomial to 
factor in terms of degree at most 2 is higher than 1/3, we can recover all the 
discrete logarithms of the irreducible polynomials of V gi ^G(g 1 ),G(G(g 1 ))- 


Simplified Setting for Frob. Representation Dlogs 389 


Lemma 4 (Systematic factor of particular 3-brackets in the useful 
variant). Let G denote the map of © and let ho, hi, A and B be four poly- 
nomials such that ho(X) = X(X + w), h\{X) = uX + v, A(X) = X 2 (X - g) + 
a(X - G(G(g))) and B(X) = X(X - G(g)) + b(X - G(G(g))), with u,v,w,a,b 
and g in¥ q . Then [A, B] 3 is divisible by (1 + u) X 2 + (1 + u)(v + w)X + vw + v 2 . 

Proof By bilinearity and antisymmetry: [A,B] S = [X 2 (X - g) , X (X - G(#))] + 

b[X 2 (X-g),X-G(G(g))] 3 + a[X-G(G(g)),X(X-G(g)))] 3 . The result of 
the lemma comes from the computation of the 3 bracket of the three pairs of 
different elements made with X 2 (X - g), X(X - G(g)) and X - G(G(g)). □ 


Fraction of Degree 3 Polynomials Covered by Our Groups. Since we 
can recover all the discrete logarithms of the irreducible polynomials that appear 
in a group, the question that remains is whether every polynomial belongs to 
one of these groups at least. 

Valid groups. In the sequel we restrict ourselves to the case where v + w ± 0. Yet, 
if v + w = 0 then G is the zero mapping. This case is studied in the extended 
version of our article. To study the properties of our group, it is convenient to 
remark that since G is an homography, we can transform it into a permutation 
of the projective line Pi(F g ). As classically done, we add the two following values 
of G: 

G(oo) = 0 and G(v + w) = oo. 

With this additional definition, we see that the groups we consider are indexed 
by triple (g, G(g), G(G(g))) which do not contain the value oo. Since v + w ± 0 
then oo belongs to a cycle of length at least 3. Thus, there are q- 2 valid groups 
corresponding to the values of g in ¥ q - (G _1 (oo), G _1 (G _1 (oo))}. With this 
description we reach at best q 3 - 2 q 2 polynomials of degree 3. 

Groups at infinity. To reach more polynomials we define three additional groups 
' P g ,G(g),G(G(g )) when g,G(g) or G(G(g)) is equal to oo. These groups are given 
by the following descriptions: 

r oo,o,G(o) = {* ( 'x 2 + J + a x 2 + /? (X - G(0))|(a, /?) e J . 

Poo-i,oo ,0 = G 2 (* - oo" 1 ) +aX + p[x 2 + j I (a,0) 6 F" J . 

and Poo- 2 , 00 - 1,00 = {X 2 (X - oo -2 ) + aX(X - oo -1 ) + /? \(a,/3) e F^}. 

where oo -1 stands for G -1 (oo) and oo -2 for G -1 (G _1 (oo)). We remark that these 
three extra groups at infinity satisfy the same systematic divisibility properties 
as the usual groups. Moreover, we enlarge the number of available polynomi- 
als to q 3 + q 2 , which is now enough to possibly cover all the monic degree 3 
polynomials. 
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Covering every degree 3 polynomials. Let P(X) = X 3 +a 2 X 2 +a\X +ao be an arbi- 
trary monic polynomial of degree 3. If P belongs to a valid group P g ,G(g),G(G(g))i 
there exist a and /? such that: 


a-g = a 2 , 
fi-aG(g) = ax, 
and -0G(G(g)) = ao. 

Substituting the equations into each other, we find that this implies: 

ao = - (ai + (a 2 + g) G(g)) • G(G(g)). (8) 

After simplification this becomes H ai ^ a2 (g) = ao, where H aiiCL2 is an homography 
whose coefficients depend on a± and a 2 . If there is no degenerescence inside the 
coefficients of i/ ai)Cl2 , there is exactly one possible value for g. Let us write 
the homography H ai , a2 (g) = where A = -v(w + v)((l + u)ai + va 2 ), M = 

v((l + u)ai~v(v + w)), X' = (1 +u)(u(v + w)+w) and p! - -(l + 'a) 2 . Thus, several 
cases appear: 

— If ao ± p/ p ' , then the homography is invertible. 

• As a consequence, as long as g ± oo _1 and g + oo -2 , the polynomial P be- 
longs to the valid group generated by g = (ao), G(g) and G(G(g)), 

and only to this one. There are q 3 - 3 q 2 such polynomials. 

• If g = oo -1 then H ai:CL2 ( oo -1 ) = ao becomes ao(A' +p' (v+w)) - A +p(v+w) 
and finally ao = oo~ 1 v(a 2 + oo _1 )/(l + u). Besides, P belongs to the group 
at infinity Poo-y^o if there exists a and /? such that /3-oo -1 = a 2 , a = ai, 
and /3v(v + w)/(l + u) = ao- Substituting the previous equations in (3 into 
each other, we find that this implies ao = oo~ 1 v(a 2 + oo -1 )/(l + u). Thus, 
the polynomial P belongs to Poo- 1 , 00 , o- There are q 2 -q such polynomials. 

• Similarly, if g = 00 -2 then P belongs to the group at infinity Poo- 2 , 00^,00 
and, again, there are q 2 - q such polynomials. 

— If ao = p/ p' then Equation (|8j) is equivalent to 0 = g(aop' - p) = A - Ah 
Moreover requiring A = A' leads to a 2 = a\ + n") where n - (l + u)/(v 2 (v + 
w)), k/ = v(v + w) and k” - -u(v + w)-w. 

• If a 2 = a\ + n") then P belongs to all the valid groups. There are q 
such polynomials. 

• If a 2 + n{n r a\ + n") the question is whether the q 2 - q remaining poly- 
nomials belong to a group at infinity. Hopefully, if a denotes a 2 and /3 
denotes a\ - v(v + w)/(l + u) then we have the following equality be- 
tween polynomials: X{X 2 + v(w + v)/(l + u)) + aX 2 + f3 ( X - G(0)) = 
X 3 + a 2 X 2 + a\X + v(v(v + w) -ai(l + u))/(l + u) 2 - P(X). As a conse- 
quence, P belongs to the group at infinity Poo,o,G(o)- 

Remark 1. The previous proof does not interact with the restriction on a 2 . Thus, 
the q polynomials satisfying ao = p/ p' and a 2 = + n") belong also to the 

group at infinity Poo,o,G(o)- Moreover, we notice that each intersection between 
two groups at infinity consists in q polynomials. 


Simplified Setting for Frob. Representation Dlogs 391 


3.3 Discrete Logarithms of Degree 4 Polynomials 

Previous Deadlocks. The natural approach for computing the logarithm of 
h( 6 ) where 1 4 is an irreducible polynomial of degree 4 is to start from the two 
polynomials A(X ) = X 3 + a\X + ao and B(X) = X 2 + b\X + 60 , construct a 
relation from Equation ([5j) and require that I 4 divides [A,B] S . Rewriting this 
last condition as \A,B ] 3 = 0 (mod 14 ), we obtain 4 bilinear equations in the 4 
unknowns (ao, ai, bo, bi). Experimentally, as explained in [Jou m , this system is 
easy to solve using standard Grobner basis algorithms. However, on average, the 
system has solutions only for half of the degree 4 polynomials. As a consequence, 
the other half polynomials are not accessible using this technique. 

Another idea, already present in 1AMORH14] , is to use the additional relations 
from Section 13.11 to improve the probability of success. For an irreducible of 
degree 4 that failed to by expressed in terms of degree 3 polynomials, there 
is a 1/2 chance that its image by Frobenius, whose degree is 8 , factors into 2 
quart ic polynomials. Each of them has a 1/2 chance to be expressed in terms of 
degree 3 polynomials. Thus, for a polynomial that failed, we have a 1/8 chance to 
compute its logarithms through this process. This increases the global probability 
of success for a degree 4 irreducible to 9/16. Repeating the process, we can further 
improve the success probability. Heuristically, we expect to have a probability 
of po = (4 - \/S )/ 2 « 0.586. Unfortunately, this does not suffice to obtain all 
degree 4 polynomials. In order to bypass this problem, several techniques have 
been considered but none of them are sufficient in the general case. We propose 
here an approach that fits to the simple construction whereas the useful (but 
tricky) variant is detailed in the extended version of the article. 

Improved Approach for Degree 4 Polynomials for the Simple Con- 
struction. The general approach we propose consists in dividing the degree 4 
polynomials in groups of size q 3 and following an approach close to the case 
of the degree 3 polynomials presented in Section 13.21 We first compute all the 
discrete logarithms of a group Q g of degree 4 polynomials of the form: 

Q g = {(X 4 + g) + aX 3 + pX 2 + 7 X\(a, p, 7 ) e F^}. (9) 

To do so, we use a partition of this group Q g = l ' g ' e w q Qg,g f where: 

Qg.g' = {{X A + g ) + aX 3 + PX 2 + g’X \(a, P) eW 2 q }. (10) 

To build relations involving the polynomials from Q g , g ' we apply Equation ([5)) 
with polynomials of the form A(X) = (X 4 +g) + aX 2 +g' X and B{X) = X 3 + bX 2 . 
With the simple construction, Lemma [5] shows that [A, B] A is of degree 11 and 
has a systematic factor of degree one. Together with the general degree 3 sys- 
tematic factor coming from Lemma [TJ we are left with a polynomial of degree 7. 
According to Appendix [B] the probability that it factors in terms of degree at 
most 3 is about 24%. 

Besides, the number of irreducible polynomials in Q g , g ' is close to q 2 / 4. Com- 
bining with previous techniques, after removing the irreducibles whose loga- 
rithms can be obtained, we are left with approximately (1 -0.586) -g 2 /4 w 0.10 q 2 
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unknowns. Thus we obtain enough equations to solve the linear system. Finally, 
we recover the discrete logarithms of Q g by computing the ones of its q sub- 
groups. 

Lemma 5 (Systematic factor of particular 4-brackets in the simple 
constructional). Let ho, hi, A and B be four polynomials in F g [X] such that 
ho is affine, hi{X) = X(X + t), A(X) = (X 4 + g) + oX 2 + ol'X and B(X) = 
X 3 + f3X 2 + ffiX. Then [A,B] 4 is a polynomial of degree at most 11 divisible 
by X. 

Computing the remaining discrete logarithms. Let I4 £ Q g be a degree 4 polyno- 
mial. We start again from A{X) = (X 4 + g) + aX 2 + a! X and B{X) = X 3 + bX 2 + 
b'X, and apply Equation ([5]) to construct a relation such that I 4 divides [A, B] 4 . 
As in [Jou m , the heuristic probability to find a solution from the bilinear system 
is 1/2. Extracting the degree one factor of Lemma [5] and the general degree 3 sys- 
tematic factor of Lemma [U and dividing then the degree 11 polynomial [A,B] 4 
by our degree 4 polynomial I 4 , we are left with a polynomial of degree 3, which 
logarithm is already known. Thus, with only one group of the form described 
in ([9]) we recover the discrete logarithms of approximately halQ the irreducible 
missing polynomials of degree 4. 

To obtain the remaining polynomials, we recursively apply this method to 
other groups of the form ([9]). We show in Section I4~3l that 0(log(g)) such groups 
suffice and that the cost of their computations is asymptotically dominated by 
the cost of the first one, which is 0 (q 6 ), as announced. 

4 Asymptotic Complexities 

4.1 Recovering Discrete Logs of Degree 2 Irreducible Polynomials 

We require to collect about q 2 equations in the Relation Collection phase. Since 
the probability to obtain a good relation is lower-bounded by 2/3, this phase 
costs 0(q 2 ) operations. We perform then a sparse linear algebra phase on a 
matrix of size 0(q 2 ). We recall that due to the form of the relations that are 
created, the number of entries in each row is O(q). The total cost to recover the 
discrete logarithms of degree 2 polynomials is so 0 ((q 2 ) 2 • q) = 0 (q 5 ). 


4.2 Recovering Discrete Logs of Degree 3 Irreducible Polynomials 

With the really simple construction. Since each group V g contains 0(q 2 ) un- 
knowns and since the linear algebra is done with a matrix containing O(q) 
entries per line, the cost of computing a single group is 0(q 5 ). There are q such 
groups and the global cost is, thus, 0 (q 6 ). 

5 The proof of this lemma works as the one of Lemma [3] 

6 The probability to recover the logarithm of a missing polynomial is in fact higher 
than 1/2, since we can use additional equations as presented in Section [3.11 Even 
there are very useful in practice, the 1/2 probability already suffices for the analysis. 
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With the useful variant. We consider 3 groups at infinity and q- 2 valid groups 
with 0(q 2 ) unknowns each. Thus the global cost of this phase is 0(q 6 ). 


4.3 Recovering Discrete Logs of Degree 4 Irreducible Polynomials 

With the simple construction. We compute first the discrete logarithms of one 
group of the form ([TO]) . Since we have a system of dimension 0(q 2 ) with O(q) 
entries per line, it can be solved for a cost of 0(q 5 ). To recover the logarithms 
of one group of the form (j9]), we need thus 0(q 6 ) operations. 

Besides, the probability to recover the logarithm of an irreducible degree 4 
polynomial from the first group of the form @ is heuristically 1/2. Consider- 
ing that the probabilities are independent, with k such groups, the proportion 
of discrete logarithms that are left unknown is l/2 k . Clearly, as the number 
of available groups grows, this proportion quickly tends to 0. With 0(log(g)) 
such groups we expect to obtain all degree 4 polynomials. As a consequence, 
performing the computation of 0(log(g)) groups in this direct way, we would 
obtain a global complexity of 0(q 6 log q). However, this overlooks the fact that 
for each new group that we wish to compute, the size of the corresponding linear 
system decreases and the rate of decrease follows a geometric progressior{]. As a 
consequence, the cost of computing the required 0(log(g)) groups is dominated 
by the computation of the first one. 

Hence, the total complexity of the precomputation phases becomes 0(q 6 ). 
This has to be compared with the previous 0(q 7 8 ) complexity for the same phases. 
However, we recall that the part of the algorithm that dominates the asymptotic 
complexity of each Frobenius Representation algorithm is the Descent phase, 
which is not under consideration in this article. 

5 A Computational Example in Characteristic 3 

To illustrate our algorithm, we have implemented our new ideas for a real-sized 
example in characteristic 3. Namely, we let q = 3 5 and define ¥ q = Fs[a], where 
a satisfies a 6 - a + 1 = 0. Choosing ho = X 2 + a 111 X and h\ = aX + 1 we see 
that X hi(X q ) - ho(X q ) has an irreducible factor of prime degree 479. We let U 
denote a root of this irreducible polynomial and construct F35479 as F q [U]. 

The cardinality of the finite field we consider is a 3796-bit integer. A good 
point of comparison is the computation over F 2 i 2-367 performed in |GKZ14a] . 

7 Another option is to continue the computation for all groups. Due to the geometric 
progression, the complexity of this part is the same. Yet, it yields a total runtime 
lower than the option of recomputing on the fly the missing degree 4 polynomials 
logarithms when required but as a side effect it raises the required amount of storage. 

8 We consider here algorithms of Wiedmann or Lanczos families, that has a complexity 
of 0(n 2 ) for a square matrix with n columns. Yet, using dense linear algebra with 
fast matrix multiplication instead of sparse linear algebra would lower the asymptotic 
complexity from 0(q 6 ) to 0(q 5 ' 746 ). We do not choose to consider these algorithms 
here since there are not at all competitive in practice. 
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Indeed, even if the bitsize of this computation was slightly larger than ours, 
being on 4404 bits, this total size included a factor of two in the exponent 
which comes for free when using the older Frobenius Representation algorithms. 
More precisely, the main drawback of our approach is that instead of computing 
logarithms in the field F^d/e it only computes in F q k . Many cryptographers have 
commented on this free factor, claiming that it is not really relevant in practice 
and that one should rather consider extension field of prime degree that can be 
embedded in the target field. For us, this is F 3479 a 760-bit field. This can also 
be compared to the largest computation of this form currently performed in the 
finite field F 2 so 9 (see |BBD + 13Q . 

With this example, computing all the discrete logarithms of the factor base 
with D = 2, containing 29646 irreducible polynomials, required 16 sequential 
hours on a single core of an Intel Core i7 at 2.7 GHz. The equations themselves 
took 35 seconds to produce, the 16 hours being the cost of the linear algebra 
modulo: 

o5-479 _ 1 

M = . 

488246858 

Enlarging the factor base to degree 3 polynomials was performed with 244 in- 
dependent computations, each involving 19602 unknowns in the corresponding 
linear system. On the same machine, the sequential cost of one such computation 
is 6.5 hours. Since these computations are independent, they are straightforward 
to parallelize. 

For degree 4 polynomials, the first subset of 243 independent computations 
we considered contained on average 7 385 unknowns in each linear system. The 
largest system contained 7571 unknowns and the smallest 7212. Note that this 
used a suboptimal variation of the technique obtained in Section [3731 and induced 
slightly larger system. Using the correct variation, we would expect a smaller 
number of unknowns per linear system (around 6100). 

The second subset has on average 3674 unknowns, the third 1 829, the fourth 
909, the fifth 452. We see that as predicted, the rate of decrease is very steep, es- 
sentially a geometric series of ratio 1 / 2 . As a consequence, the runtimes for these 
subsets rapidly becomes negligible compared to the main part of the computation 
consisting in tackling the degree 3 polynomials. Here again, our implementation 
is suboptimal, but this was not a critical part of the computation. In fact, for all 
subsets beyond the fifth, we only tried to the logarithms of the elements in terms 
of the first four subsets. Indeed, the resulting systems were so small (around 450 
unknowns) and sparse that they could be solve with a straightforward Gaus- 
sian elimination. Thus for these subsets, the running time was dominated by 
the generation of the equations (around 2 h for each subset) and it did not make 
sense to insist on reducing the size of the linear systems. In total, we computed 
30 subsets and they were enough to express the logarithms of all the degree 4 
elements encountered further during the computation. 

For the descent phase, we followed the state of the art and were able to express 
the seeked discrete logarithm using a total of under 41 millions polynomials of 
degree 4 (and of course also polynomials of lower degree). For lack of space, we 
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leave out the details, they will be reported in the extended version of this article. 
The total running time of the computation was under 8600 CPU-hours. 

6 Conclusion 

In this paper, we proposed an improved Frobenius Representation algorithm for 
the computation of discrete logarithms in small characteristic. Together with 
the aim of simplifying the description of previous algorithms, we reduce the 
complexity of the precomputation phase to 0(q 6 ) for general extension degree. 
Computations with such a cost were previously available only for special degrees 
such as Kummer extension. 
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A Action of PGL 2 (F q ) on Polynomials 

We detail here the reason why we can restrict ourselves to the case A(X ) = 
X D +a(X) and B(X) = X D ~ x + b(X ), with a and b polynomials of degree D- 2. 

Assume that we are initially given an equation for two degree D polynomials 
Ao and Bo . We may assume that these two polynomials are monic by multiplying 
Equation ([5]) by the inverse of the product of their leading coefficients. Moreover, 
thanks to Proposition [T] we have [Ao,Bo]d = [A 0 ,Bo - Ao]d- Thus, we can 
replace Bo by B\ = Bo - Ao. If there is no unexpected fall of degree (i.e. in the 
general case), B\ has degree D - 1. We can again assume that it is monic. If the 
coefficient of X D ~ x in Ao is ajj- i, remarking that: 

[Ao,Bi]d = [Ao - aD-iBi,Bi]D, 

we can replace Ao by a polynomial A\ whose coefficient of X D ~ x is 0. Thus, the 
pair (Ai,Ui) generates the same equation as ( Ao,Bq ) and has the announced 
restricted form. 

B Estimating Probabilities of Factoring Polynomials 

Throughout the paper, we need to estimate the probabilities that a polynomial 
of degree D factors into terms of degree at most d. This is often done by using 
the heuristic rule that the polynomial behaves in this respect like a random 
polynomial. 

In this appendix, we analyze these probabilities for random polynomials. Let 
us start we a simple example and consider the probability that a random monic 
polynomial of degree D splits into linear factors. Over the finite field ¥ q there 
are q D distinct monic polynomials of degree D. Among those it is easy to count 
the number of squarefree polynomials that split into linear terms, there are 
in correspondance with their D distinct roots in F g , thus there are precisely 
(^) = g-(g~ 1 )-0~(- p ~ 1 )) suc h polynomials. Hence, the fraction of polynomials 
that split is lower bounded by (^ ) • q ~ D , which tends to 1/D\ as q tends to 
infinity. 

To obtain an upper bound, we also need to count the polynomials that split 
and have multiple roots. The formula is more complex since we need to compute 
a sum over partitions of D into multiplicities. However, the number of terms in 
this sum is independent of q and each term is a multinomial that chooses the 
correct number of roots with each multiplicity. Since each term contains at most 
D - 1 roots, we can upper bound the contribution by C(D)q D ~ 1 where C(D ) 
does not depend on q. Thus, as q tends to infinity, the upper bound on the total 
fraction of polynomials that split tends to 1 /D\ too. 
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For more complex decomposition, this kind of analysis remains doable but 
messy for arbitrary fixed values of D and d. Thankfully, in the present paper, 
we are only considering values such that: 

d+1 >D/2. 

Under this constraint the analysis becomes quite easy. Indeed, if a polynomial P 
of degree D does not factor into terms of degree at most d, it must have at least 
one factor Fk of large degree k>d + 1. Since k > D/ 2, this factor is unique. Now, 
the probability that P can be written as Fk'Q, with Fk an irreducible of degree k 
and Q an arbitrary polynomial of degree D-k is precisely ( Nk-q D ~ k )/q D = Nk/q k , 
where Nk denotes the number of irreducible polynomials of degree k over ¥ q . 
Thus, the probability is precisely the fraction of irreducibles among degree k 
polynomials and it is well-known that this tends to 1/k as q tends to infinity. As 
a consequence, as q tends to infinity the probability that a degree D polynomial 
factors into terms of degree at most d, when d + 1 > D/2 tends to: 

D 1 

!- E l- 

k=d+l 

Using this we can easily estimate the probabilities required in the paper: 

> For D - 3 and d- 2 the probability is 1 - | = | . 

> For D - 4 and d- 2 the probability isl-|-| = y^« 0.4167. 

> For D = 7 and d- 3 the probability isl-|-|-|-y = ^ « 0.2405. 



Big Bias Hunting in Amazonia: Large-Scale 
Computation and Exploitation of RC4 Biases 
(Invited Paper) 

Kenneth G. Paterson 1 , Bertram Poettering 1 , and Jacob C.N. Schuldt 2 

1 Information Security Group, Royal Holloway, University of London, U.K. 

2 Research Institute for Secure Systems, AIST, Japan 


Abstract. RC4 is (still) a very widely-used stream cipher. Previous 
work by AlFardan et al. (USENIX Security 2013) and Paterson et al. 
(FSE 2014) exploited the presence of biases in the RC4 keystreams to 
mount plaintext recovery attacks against TLS-RC4 and WPA/TKIP. 
We improve on the latter work by performing large-scale computations 
to obtain accurate estimates of the single-byte and double-byte distri- 
butions in the early portions of RC4 keystreams for the WPA/TKIP 
context and by then using these distributions in a novel variant of the 
previous plaintext recovery attacks. The distribution computations were 
conducted using the Amazon EC2 cloud computing infrastructure and 
involved the coordination of 2 13 hyper-threaded cores running in parallel 
over a period of several days. We report on our experiences of computing 
at this scale using commercial cloud services. We also study Microsoft’s 
Point-to-Point Encryption protocol and its use of RC4, showing that it 
is also vulnerable to our attack techniques. 

Keywords: RC4, plaintext recovery attack, WPA, TKIP, MPPE. 


1 Introduction 


1.1 RC4 and Its Applications 

The stream cipher RC4, originally designed by Ron Rivest, is a beautifully com- 
pact and fast algorithm. It became public in 1994 and has since been applied in 
a very wide variety of secure communications protocols, including SSL/TLS (as 
analysed in mam); WEP 0 (where its particular usage led to devastating 
attacks including complete, efficient key recovery, see [20] for a summary and ref- 
erences); WPA [6] (as analysed in |21l20l22ll5ll8j ): Microsoft’s Point-to-Point 
Encryption protocol [14] (MPPE, as analysed here); and some Kerberos-related 
encryption modes [8|. A selection of additional, non-protocol specific analyses of 
RC4 can be found in [312112111110119] . 

Of particular relevance for this work are the results of AlFardan et al. [I] • They 
introduced a simple, Bayesian statistical method that recovers plaintexts that 
are repeatedly encrypted under RC4 by exploiting biases in RC4 keystreams. 
Their approach was successfully applied to RC4 in HTTPS (i.e., HTTP over 
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SSL/TLS), where a fresh pseudorandom 128-bit key is used for each SSL/TLS 
connection, and where the repeated encryption of HTTP cookies can be arranged 
by having malicious JavaScript running in the target user’s browser. 


1.2 RC4 in WPA/TKIP 

The work of AlFardan et al. motivated us to explore RC4’s usage in other de- 
ployed protocols, in an attempt to determine whether similar weaknesses exist 
and are exploitable. Our first focus was the wireless network encryption protocol 
WPA/TKIP [||], with results presented in [T5] . While WPA/TKIP was only ever 
intended as a stop-gap to replace WEP until stronger cryptography could be 
deployed, a recent survey [22] showed that it is still in widespread use. 

In WPA/TKIP, fresh 16-byte (128-bit) RC4 keys are used for every frame 
transmitted on the wireless network, but the first three bytes of the key are 
determined by two bytes TSC = (TSCo,TSCi) of a public value, TSC, which incre- 
ments on a frame-by-frame basis; the remaining 13 bytes of the per- frame key 
are generated pseudorandomly. As observed in m and independently in [18], 
the dependence of the RC4 key on TSC in turn induces large, TSC-dependent, 
single- byte biases in the initial positions of RC4 key streams. This suggests the 
attack proposed in [15]: bin the available ciphertexts into 2 16 bins, one bin for 
each possible value of TSC; perform a Bayesian analysis as per [I for each bin; 
and then combine the results across all the bins to estimate the likelihood for 
each plaintext byte candidate. But this attack requires the computation of ac- 
curate single- byte distributions for RC4 keystreams for each of the 2 16 values 
of TSC. We estimated in m that the analysis of 2 32 - 2 40 RC4 keystreams per 
TSC would be needed to achieve sufficient accuracy, for a total of 2 48 - 2 56 RC4 
keystreams. At that time, this was well beyond our computational capabilities. 
We resorted to working with 2 24 keystream per TSC and using the sub-optimal 
procedure of examining the dependence of the RC4 keystream only on TSCi, in 
effect aggregating over TSCo (since our intuition was that this byte would have 
a greater influence in determining the distribution than TSCo). 

Another avenue left unexplored in [15] for WPA/TKIP was the use of double- 
byte biases in plaintext recovery attacks. Such biases concern the distribution 
of adjacent pairs of keystream bytes. They were used in [I] in the preferred 
attack against SSL/TLS, because these biases are persistent throughout the 
RC4 keystream (whereas the single-byte biases disappear shortly after position 
256) and, in the considered attack scenario, it was not possible to arrange for 
the target plaintext bytes (an HTTP cookie) to appear sufficiently early in the 
sequence of plaintext bytes. It’s also possible that using a double-byte bias attack 
would improve plaintext recovery rates in the early positions. To extend the 
double-byte bias attack of m to the WPA/TKIP setting would then require 
the computation of the double-byte keystream distributions, ideally on a per-TSC 
basis. This would not only require enormous numbers of keystreams to obtain 
sufficient accuracy, but also significant storage: just to describe the double-byte 
distribution per position and TSC requires 2 16 numbers, each typically 32 bits 
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in size, leading to a total storage requirement of 8 Terabytes just to record the 
double- byte distributions for the first 512 key stream positions. 

1.3 RC4 in MPPE 

Microsoft’s Point-to-Point Encryption (MPPE), as specified in [14123] . is a ven- 
erable security protocol that can be used on top of the Point-to-Point Tunnelling 
Protocol (PPTP). The latter is itself a general-purpose protocol encapsulation 
method that is commonly used for providing Virtual Private Networking services 
to devices running Microsoft operating systems, including Windows 8 and the 
Windows Server family of products. 

MPPE uses RC4 with a non-standard method for selecting keys. For example, 
when a 40-bit key is used, MPPE starts with an 8-byte key K = (Ko, . . . , K7) that 
is itself derived by hashing a user password, an authentication protocol challenge, 
and other public information. MPPE then sets Ko = OxDl, Ki = 0x26, K2 = 0x9E. 
It is then natural to ask: does this method for selecting keys in MPPE lead to 
a different bias structure in its RC4 keystreams, and does this help or hinder 
plaintext recovery attacks akin to those of jT]? 


1.4 Our Contributions and Paper Organisation 

Section [2] provides further background on the RC4 stream cipher and its use in 
WPA and MPPE. 

In Section [3j we report on our computations of more refined, per-TSC, single- 
byte and double-byte RC4 keystream distributions for WPA/TKIP. In slightly 
more detail, we computed these distributions for the first 512 keystream bytes, 
based on 2 48 keys for the single-byte case and 2 46 keys for the double-byte 
case. We made use of the Amazon Elastic Compute Cloud (Amazon EC2jj], 
which is part of Amazon Web Services, to perform the computations. We used 
approximately 30 virtual-core- years for the single-byte computation and 33 vir- 
tual core-years for the double-byte computation. Since, to us, a total of 63 
virtual-core-years was quite a significant amount of computation (costing roughly 
US$4110) and because we faced a number of obstacles in working at this scale, 
we report in some detail on our experiences of working with Amazon EC2. One 
notable feature revealed by our large-scale computations is the presence of TSC- 
dependent, single-byte biases well beyond position 256 in the RC4 keystream. 

Section [4] describes a plaintext recovery attack on WPA/TKIP that exploits 
our newly-computed and more accurate single-byte distributions for RC4 
keystreams, comparing it to our previous results from [T5] . 

Section [5] describes a novel plaintext recovery attack on WPA/TKIP that 
exploits per-TSC, double-byte biases in RC4 keystreams. This attack combines 
the double-byte bias attack from [1] with the idea of binning that was developed 
for the case of single-byte biases in P3- 


1 http://aws.amazon.com/ec2/ 

2 Here, and throughout, we quote prices exclusive of sales taxes at 20%. 
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Algorithm 1. RC4 KSA 
input : key K of l bytes 
output: initial internal state sto 

begin 

for i = 0 to 255 do 

L <- i 

j-f-o 

for i = 0 to 255 do 

j j + S [*] + K j mod l 

|_ swap(<S[i],<S[?']) 

i,j <- 0 
st 0 «- (i,j,S) 

_ return sto 


Algorithm 2. RC4 PRGA 
input : internal state st r 
output: keystream byte Z r + i 

updated internal state st r + i 

begin 

parse ( i,j , S) <— st r 
i <— i + 1 

j <-j + <S[i] 

swap (S [i], S \j]) 

Z r +i <— >S[>S[i] + S[j]} 

str+l (■ i,j,S ) 

return (Z r +i,st r +i) 


Fig. 1. Algorithms implementing the RC4 stream cipher. All additions are performed 
modulo 256. 

In Section [6j we report on the single-byte keystream distributions for RC4 
when it is keyed according to the MPPE specification. In short, we found the 
distributions to be highly skewed and amenable to exploitation using our attack 
techniques. 

Finally, Section [3 presents our conclusions and remarks on open problems. 

2 Further Background 

2.1 The RC4 Stream Cipher 

Technically, RC4 consists of two algorithms: a key scheduling algorithm (KSA) 
and a pseudo-random generation algorithm (PRGA), which are specified in Al- 
gorithms [T] and O The KSA takes as input a key K, typically a byte- array of 
length between 5 and 32 (i.e., 40 to 256 bits), and produces the initial internal 
state sto — (i,j, 5), where S is the canonical representation of a permutation 
on the set [0,255] as an array of bytes, and z, j are indices into this array. The 
PRGA will, given an internal state st r , output The next’ keystream byte Z r+ 1 , 
together with the updated internal state st r +\. 

2.2 WPA/TKIP 

A detailed description of how RC4 is used in the WPA/TKIP context is given 
in [15]. In short, WPA/TKIP generates a fresh 128-bit key K = (K 0 , . . . , K i5 ) for 
RC4 for each frame that is transmitted; the key is a function of the temporal 
encryption key TK (128 bits), the TKIP sequence counter TSC (48 bits), and 
the transmitter address TA (48 bits). A single value of TK is used over many 
frames, while TSC increments from frame to frame; meanwhile TA is fixed. Very 
importantly, the function used to compute K adds a specific structure added to 
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“preclude the use of known RC4 weak keys” [6] . More precisely, writing TSC = 
(TSC 0 , TSCi , . . . , TSC 5 ) , we have 

K 0 = TSCi Ki = (TSCi | 0x20) & 0x7f K 2 = TSC 0 (1) 

while K 3 , . . . ,Ki 5 can be considered to be pseudorandom functions of TK, TSC 
and TA. Notably here, bytes Ko, Ki, K 2 depend only on bytes TSCo and TSCi of TSC. 
Moreover, the bits of TSCi are used twice. So the bytes of K have more structure 
than they would if they were chosen with uniform distribution. The per-frame 
key K is then used to produce an RC4 keystream, following our description of 
RC4 above. The TKIP plaintext (consisting of the frame payload, a 64-bit MAC 
value MIC , and a 32-bit Integrity Check Vector ICV) is then XORed in a byte- 
by-byte fashion with the RC4 keystream. 

2.3 MPPE 

MPPE provides a confidentiality service over PPTP using the RC4 algorithm. 
Keys for the RC4 algorithm come from a separate authentication and key 
establishment protocol, such as MS-CHAPvl, MS-CHAPv2 or EAP-TLS; the 
first two of these were broken in m and [4], respectively, leading to the depre- 
cation of the first and the recommendation only to use the second with additional 
protection from peafU 

RFC 3079 [23l describes in detail how the keys used in MPPE’s instantiation of 
RC4 are derived from preceding authentication and key establishment protocols. 
Three different RC4 key lengths are supported, according to [23] : 40-bit, 56- 
bit and 128-bit. When a 40-bit key is used, MPPE starts with an 8-byte key 
K = (Ko, . . . , K 7 ) that is itself derived by hashing the password, the authentication 
protocol challenge, and other public information. MPPE then overwrites Ko = 
0xDl,Ki = 0x26, K 2 = 0x9E. When a 56-bit key is used, the protocol starts with 
the same 8-byte key and then sets Ko = OxDl; when a 128-bit key is used, a 
similar procedure involving password and challenge hashing is used to generate 
a 16-byte key K, and no bytes of K are overwritten. 

Furthermore, MPPE operates in two modes, with the mode in use being de- 
termined by a PPTP header field. In stateless mode, the RC4 key is refreshed 
and the cipher restarted for each PPTP packet sent. By contrast, in stateful 
mode, the RC4 key is refreshed only every 256 packets. In both cases, refreshing 
the key involves hashing the old key with the first key for the session (called 
StartKey in [14]) to generate a value Inter imKey, then an RC4 encryption step 
in which Inter imKey is used to encrypt itself to generate a key K of either 8 
or 16 bytes, and finally setting bytes as described above. See [14], Section 7] for 
details. 

From the above description it may be remarked that, while the hashing and 
encryption steps used in deriving the RC4 keys may be intended to render them 
pseudorandom, in the 40-bit and 56-bit cases, they have additional structure 

3 https : //technet .microsoft . com/library/ security/2743314 
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that may be expected to lead to additional and/or different biases in the RC4 
keystream as compared to the 128-bit case. Further, the use of stateless mode 
would mean a fresh RC4 key (with additional structure in the 40-bit and 56-bit 
cases) for every packet sent. These observations mean that MPPE in stateless 
mode can be expected to be vulnerable to plaintext recovery attacks similar to 
those developed in m- Since the protocol encapsulated by MPPE is likely to 
be IP, similar fields as those identified in im could be targeted. We note that 
while key lengths of 40 and 56 bits are small enough that a simple brute-force 
search might initially seem to be more efficient than mounting a bias analysis 
using our techniques, in the stateless case, such a brute-force search would only 
recover the key used for a single packet. Moreover, a basic analysis suggests that 
a 2 64 attack would be needed to recover St art Key from which all keys in the 
session are derived. So our approach may be an attractive alternative if specific 
plaintext bytes are targeted for recovery. 

3 Large-Scale Computation of RC4 Keystream 
Distributions for WPA/TKIP Keys 

3.1 Computing Keystream Distributions and Finding New Biases 

As noted in the introduction, in our previous work on WPA/TKIP in [15], we 
worked with a total of only 2 40 keystreams and only with single-byte distribu- 
tions for the first 256 positions. In an effort to further improve our attacks, we 
decided to perform larger-scale computations using, in addition to our own local 
resources, the Amazon EC2 cloud computing infrastructure to estimate both the 
single- byte and double- byte keystream distributions for the first 512 positions, 
on a per-TSC basis. 

Because the double- byte biases are smaller than the single- byte ones (typically 
by a factor of roughly 2 8 ), many more keystreams would be needed to accurately 
estimate double- byte distributions than for single- byte ones. However, we chose 
to focus our effort on the single-byte case here, computing distributions based 
on 2 32 keystreams per TSC in the single-byte case and based on 2 30 keystreams 
per TSC in the double-byte case. The reasons for this focus are as follows. Using 
our local computational resources, we determined that it would be difficult to 
use the full per-TSC distributions in a double-byte attack akin to that of [TJ 
because of the complexity of handling so much data when running attacks (for 
example, we would need to deal with 16GB of distribution data and perform 
2 48 multiplications of real numbers to analyse a single byte position). Rather, 
an aggregated approach seemed more likely to be feasible for the double-byte 
setting. Here our idea was to start with 2 30 keystreams per TSC and combine the 
2 8 distributions for each TSCi value (called TSCo-aggregation in m to obtain 
2 8 different double-byte distributions, one per TSCi-value, each distribution now 
based on 2 38 keystreams. This not only boosts the number of keystreams per 
distribution estimate (good for accurately estimating biases), but also reduces 
the size of the distribution data and computation both by a factor of 256 (making 
simulation of attacks much more feasible). 
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(a) Position 260 (b) Position 270 



(c) Position 300 


Fig. 2. Pictorial representation of biases in RC4 keystreams for random TSCo- 
aggregated WPA/TKIP keys at keystream positions 260, 270, and 300, for different 
TSCi values (x-axis) and byte values (y-axis). At each point we encode the bias in the 
keystream for the (TSCi,' value) combination as a colour; precisely, we encode the dif- 
ference between the occurring probability and the (expected) probability 1/256, scaled 
up by a factor of 2 24 , capped to values in [—30, +30]. 


Our computations went well beyond those of prior RC4 cryptanalyses in scale 
(e.g., BUS]), and indeed we were rewarded by discovering new TSCi-dependent 
single-byte biases in positions all the way up to 512 (see Figure [2] for examples at 
specific positions). The existence of these biases is surprising in view of the be- 
haviour of single-byte biases observed in previous works and, in principle, would 
allow the recovery of plaintext using a single-byte attack like that presented in 
p~5l and Section [4] below. It is an open problem to determine how far into the 
RC4 keystream these biases persist. 

3.2 Reflections on Using Amazon EC2 

The task of computing accurate estimates of RC4 keystream distributions is 
well-suited to distributed computation. In particular, in the case of WPA, the 
probability distribution for each TSC value can be estimated independently by 
generating keystreams using randomly chosen WPA keys for that TSC (having 
the structure described in Section [272]) . This makes performing the computation 
using cloud services such as Amazon EC2 seem appealing, on account of its 
virtually unlimited computing capacity being able to provide the computational 
resources required to complete the computation within an acceptable period of 
time. 

For our computation of the per-TSC bias estimates, we used Amazon EC2 to 
create 256 virtual machines of the type +3.x81arge’, each providing 32 ‘virtual’ 
cores. The underlying hardware of the virtual machines were servers equipped 
with Intel Xeon 2.8GHz processors. Note, however, that each of the cores of a 
virtual machine corresponds to a hyper-threaded core of the underlying CPU i.e. , 
one ‘c3.x81arge’ instance effectively corresponds to a machine with 16 physical 
cores. To manage the virtual machines, we utilized botc0 which implements a 
Python interface to Amazon EC2. This provided a simple and straightforward 

4 http://github.com/boto/boto 
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way to automate management and access to the virtual machines, and made it 
relatively easy to set up the execution of the computation using a combination 
of Python and shell scripts. The virtual machines were all initialized with an 
Ubuntu 13.10 image obtained through the AWS Marketplac^E 

Each virtual machine was set up to compute the keystream distributions for 
all TSCo values given a fixed TSCi value, and to split this computation equally 
among the 32 available virtual cores. To make the WPA keystream generation 
efficient, we used the RC4 implementation in OpenS SI0 However, experiments 
showed that to reach the desired number of keystreams with our available budget, 
further optimizations were required. Additional experiments revealed that the 
amount of available cache in the underlying CPUs on which the virtual machines 
were running, and how this cache was utilized, played an important role in the 
performance of the keystream distribution generation. Specifically, the chance of 
cache misses occurring when updating the keystream distribution statistics was 
found to have a large influence on performance. 

To address this, we used a combination of two different approaches to re- 
duce the chance of cache misses occurring. Firstly, to fit the array storing the 
counters used to collect the statistics of the keystream distribution into the 
cache memory, we “packed” multiple small- width counters into single 64-bit in- 
tegers and implemented logic for handling counter overflows. Secondly, instead 
of updating the keystream distribution statistics after each keystream has been 
generated, we stored multiple keystreams in memory before updating the statis- 
tics. This implies that multiple updates of the statistics for a single position can 
be done sequentially, which, assuming the appropriate memory layout, increases 
the chance of a cache hit. While these optimizations only provided small gains 
for the computation of single- byte biases, significant gains were achieved for the 
computation of double- byte biases. 

Single-Byte Computations. Using the above setup, each virtual core was ca- 
pable of generating and processing on average 294k length 512 WPA keystreams 
per second for single-byte distributions. Hence, computing the per-TSC single- 
byte distributions based on 2 32 keystreams for each TSC value (i.e., 2 48 
keystreams in total), took 9.56 x 10 8 virtual core seconds in total, or approxi- 
mately 30 virtual core years. Due to the large degree of parallelism in our setup, 
this corresponds to an actual running time of slightly more than 32 hours. 

While each of the 256 virtual machine was set up identically, a single vir- 
tual machine ran significantly slower than the others, and was only capable of 
processing approximately 180k keystreams per second. We suspect that other 
virtual machines running on the same underlying hardware might have affected 
the performance of this virtual machine. Due to this issue, it took approximately 
52 hours to complete the computation of the single-byte distributions. 

At the time we did the experiments, the cost of running a single “c3.x81arge” 
virtual machine instance was US$2.40 per hour, leading to a cost of US$614 per 
hour when running all 256 instances simultaneously. 

5 http : / / aws . amazon . com/marketplace/ 

6 https://www.openssl.org/ 
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To store the generated key stream distributions, we attached a separate Ama- 
zon Elastic Block Storage (EBS) volume to each virtual machine. This gave 
us the option of terminating a virtual machine without erasing the generated 
data, and furthermore allowed us to use a single virtual machine to inspect 
and process all generated data, by sequentially attaching the EBS volumes to 
this machine. The latter provided a more cost effective solution than running 
the virtual machines in parallel, and a faster solution than resuming each vir- 
tual machine sequentially. We stored the distribution for each TSC value as a 
sequence of binary encoded 32-bit integers, leading to a storage requirement of 
512KB per distribution (128MB per virtual machine), or 32GB in total. How- 
ever, since the minimum size of an EBS volume is 1GB, we allocated a total of 
256GB of EBS storage (note that a single EBS volume cannot be mounted by 
multiple virtual machines simultaneously). The cost of EBS storage was US$0.05 
per GB per month, leading to a cost of just US$12.60 a month to maintain the 
EBS volumes. 

Double-Byte Computations. Working with double- byte keystream distribu- 
tions introduced significant overheads compared to the single-byte case, both in 
terms of computation and storage. With the previously mentioned optimizations, 
each virtual core was capable of processing on average 67k WPA keystreams 
per second. Hence, computing the per-TSC double-byte keystream distributions 
based on 2 30 keystreams for each TSC value (i.e., 2 46 keystreams in total), 
took 1.05 x 10 9 virtual core seconds in total, or approximately 33 virtual core 
years. In our setup, this corresponds to an actual running time of slightly more 
than 34 hours, but due to the virtual machines being sequentially initialized and 
an issue with a single virtual machine, the time it took to complete the computa- 
tion was approximately 48 hours. More specifically, the issue that arose was that 
the virtual machine in question was reset and rebooted during the computations, 
and hence did not complete its assigned task. We were unable to identify the 
cause of this event, and simply restarted the relevant computations manually. 

As for the single- byte distributions, we created separate EBS volumes to store 
the double- byte distributions. However, each TSC-specific double- byte distribu- 
tion requires 128MB of storage when stored as a sequence of 32-bit integers, 
leading to a storage requirement of 32GB per virtual machine, or 8TB in total. 
This increased storage overhead not only led to an increased cost (US$410 per 
month), but also created additional practical issues which we had to handle. For 
example, since the EBS volumes are implemented via network attached storage 
(NAS), writing the distribution data to an EBS volume caused a significant delay 
in some instances. In particular, we observed that immediately after completion 
of the keystream distribution generation, detaching an EBS volume might not 
succeed, which in turn could interrupt the shutdown of a virtual machine. Fur- 
thermore, making all data available to a single machine at the same time, which 
is required to efficiently run attack simulations, was made more difficult by the 
8TB size of the dataset. We decided to transfer the complete dataset to our local 
storage array both to run the attack simulations and to permanently store the 
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data. For this purpose, we used bbcf0, which is capable of transferring large 
amounts of data between network computers using multiple TCP streams and 
large transfer windows, and allowed us to obtain a transfer speed of approxi- 
mately 50MB per second, leading to a total transfer time of slightly more than 
48 hours. Note that data transfers out of Amazon EC2 were charged at US$0.12 
per GB, resulting in a US$983 cost to move the complete 8TB dataset to our 
local storage. 

Our experience of using Amazon EC2 to compute estimates of the per-TSC 
biases suggests that Amazon provides a flexible platform which is well suited to 
perform this type of computation, and that the practical difficulties arising in 
the distribution of the computation can be overcome with moderate effort. 

4 Plaintext Recovery Attacks against WPA/TKIP Based 
on Single-Byte Biases 

4.1 The Attack of Paterson, Poettering and Schuldt |15| 

The attack against WPA/TKIP in [15] builds on the single-byte bias attack (on 
TLS) of p. Both attacks work for the setting where the same plaintext is en- 
crypted many times under different RC4 keys to obtain a set of ciphertexts. The 
key idea of both attacks is that, in any given position r of the ciphertext stream, 
a guess for the repeated plaintext byte in that position induces a distribution 
on the keystream in position r, via XORing the guess with byte r in each of the 
ciphertexts in turn. This induced distribution can be compared to the known 
distribution in keystream position r (which is obtained by sampling), and the 
choice of plaintext guess giving the “best fit” selected as the attack’s output for 
position r. This is formalised as a Bayesian procedure, leading to the output in 
position r as being the plaintext candidate that maximises the probability of 
observing the induced keystream distribution in position r. 

The innovation in [15] (and independently observed in [18]) was to recognise 
that in WPA/TKIP a different keystream distribution can - and should - be 
used for each value of the byte pair TSC = (TSCo,TSCi) when estimating the 
probabilities of the induced keystream distributions. This leads to an algorithm 
that “bins” ciphertexts into 2 16 groups, one group per TSC, computes the induced 
keystream probability for each group, and takes the product of these across the 
groups to compute the probabilities for each plaintext candidate. Since our new 
double- byte algorithm in Section [5] can be seen as an extension of our algorithm 
in m, we explain the latter here in more detail. 

We first obtain a detailed picture of the distributions of RC4 keystream 
bytes Z r , for all positions r in some range, on a per (TSCo,TSCi) pair basis, 
by gathering statistics from keystreams generated using a large number of ran- 
dom keys. That is, for all r in our selected range, we estimate 

Prsc,r,k := P r (^r = k) , TSC G TscSp , k G Byte, 

7 http://www.slac . stanford.edu/- abh/bbcp/ 
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where here (and henceforth) Byte denotes the set {0x00, . . . , OxFF}, 7 scSp de- 
notes the set Byte x Byte, and where the probability is taken over the random 
choice of the RC4 encryption key K, subject to the structure on Ko, Ki, K 2 induced 
by TSC. 

Now suppose we have S ciphertexts C \, . . . , Cs available for our attack. We 
partition these into 2 16 groups according to the value of TSC (recall that the TSC 
value is public); for convenience, we assume the resulting bins of ciphertexts are 
all of equal size T = S/2 16 . Let the bin of ciphertexts associated with a particular 
value of TSC be denoted and have members for j = 1, . . . , T; we denote 

the byte at position r of by C^j r . For any position r and any candidate 

plaintext byte y for that position, vector (iV^ r k ) keByte with 

N ^&,r,k = It? e I 1 •• T \ I C TSC,j,r = k © Mil (0x00 < k < OxFF) 

represents the distribution on Z r required to obtain the observed ciphertext 
bytes (Gfscj ?r )i<j<T for bin by encrypting y. The probability r?/x that 
plaintext byte y is encrypted to bytes (Cfscj^)! <j<T in bin S for position r 
now follows the distribution: 

n AT-(M) 

(PfSC,r,k) • ( 2 ) 

kEByte 


Note that this expression differs from that in m by the omission of factorial 
terms arising in the multinomial distribution. Those terms do not need to be 
included in the formal Bayesian procedure underlying the attack (since we are 
interested in the probability of a group of ciphertexts bytes as given in a par- 
ticular sequence rather than in unordered form). Moreover, their removal makes 
the attack slightly easier to implement. 

Now the probability that plaintext byte y is encrypted to the vector of bytes 
(^Tsc,j,r ) i<j<t across all bins S^sc i n P os bion r can be precisely calculated as 

= ]^[ A^s c,r,jLt * 

TSCeTscSp 

By computing A r)/X for all y G Byte , and identifying P* = y such that A r is 
largest, we determine the maximum-likelihood plaintext byte value P*. 

Note that, for each position r and group of bytes (Cjscj^ r )i<j<T^ values 

can be computed from values by using the equation = 

for a11 k ' Further > computing and comparing log^Tsc,^) and 
log(A r?At ) instead of Afsc and A r ^ makes the computation more efficient and 
accuracy easier to maintain. Adding these optimisations leads to the attack in 
Algorithm [3] (which differs from the corresponding attack in [15] only in the 
omission of a term F ^ corresponding to the factorial terms discussed above 
and some small notational changes). 
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Algorithm 3. Plaintext recovery attack using TSC binning 

input : {C'tsc , j}fsce7s C «Sp,i<j<T — S — 2 16 • T independent encryptions of fixed 
plaintext P 

r - target byte position 

(PTSc,r,k)TSC£TscSp,keByte ~ keystream distributions for all TSC at pos. r 
output: P* - estimate for plaintext byte P r 

begin 

Afsc,/e <— 0 for all TSC G TscSp , k G Pyte 

for TSC = (0x00, 0x00) to (OxFF, OxFF) do 


for j = 1 to T do 

k e— Cjsc j r 

_ Afg c,r,Jfe ^ ^TSC, r,k + 1 


for TSC = (0x00, 0x00) to (OxFF, OxFF) do 
for fi = 0x00 to OxFF do 
for k = 0x00 to OxFF do 



return P* 


4.2 Attacks Based on Aggregation 

One method of coping with noisy estimates for the probabilities r k th a t was 
extensively explored in m was to consider aggregation of distributions over TSCo 
or over both TSCo and TSCi (effectively increasing the number of keys by factors 
of 2 8 and 2 16 , respectively). It is not difficult to see how to modify Algorithm [3] to 
work with 2 8 bins, one for each value of TSCi, instead of 2 16 bins. The execution 
of the modified algorithm becomes in practice faster, since each estimate for a 
plaintext byte /i now only involves calculation of r ^ over 2 8 TSCi values 
instead of 2 16 (TSCo, TSCi) pair values. Similarly, one can modify the algorithm 
to work with just a single bin, one for all values of TSCo and TSCi, in which case 
we recover the original algorithm of [l| , albeit without the unnecessary factorial 
terms arising from the use of multinomial distributions and using WPA/TKIP- 
specific distributions in place of the original RC4 distributions reported in [1] . 

However, the cost of using aggregation is that it “throws away” statistical 
information that may be of use in improving the accuracy of the attack for a 
given number of ciphertexts S. Indeed, this is demonstrably the case: as we 
report below, using our new estimates for the probabilities p ^ r k computed 
using a total of 2 48 keystreams in a full binning (non-aggregated) attack leads 
to an improvement in accuracy. 
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Fig. 3. Average success rates of non- aggregated (blue), TSCo-aggregated (green), and 
fully aggregated (red) single-byte plaintext recovery attacks for byte positions 1 to 256 
(based on 256 experiments). Punctured lines represent the average recovery rates for 
the odd byte positions. 

4.3 Attack Simulation Results 

We implemented the single- byte plaintext recovery attack of Algorithm [3] based 
on the keystream distribution estimates obtained from the Amazon EC2 compu- 
tations described in Section [3] We furthermore implemented the TSCo-aggregated 
and fully aggregated variants of the attack described in Section 14.21 To obtain 
bias estimates for the latter two attacks, we aggregated the Amazon EC 2 data 
correspondingly, thereby obtaining estimates based on 2 40 keystream per TSCi 
value, and 2 48 keystreams, respectively. 

The measured success rates of the attacks are shown in Figure [3j We observe 
that there is a significant difference in the recovery rates between the fully aggre- 
gated attack and the two other attacks, the non- aggregated attack being capable 
of achieving a similar success rate to the fully aggregated attack using almost 
16 times fewer ciphertexts. Likewise, the non-aggregated attack clearly improves 
upon the TSCo-aggregated attack, albeit not as significantly; the non-aggregated 
attack requires on average half as many ciphertexts to achieve a similar success 
rate to the TSCo-aggregated attack. 

In order to investigate the effect of our new and (presumably) more accurate 
single- byte keystream distributions, we also compared the performance of Algo- 
rithm [3] using keystream distribution estimates based on 2 24 keystreams per TSC 
(as in our previous work m) and based on 2 32 keystreams per TSC (obtained 
from the Amazon EC2 computations described in Section [3]). Figure [4] shows the 
results, with the attacks using the two keystream distributions in combination 
with 2 24 ciphertexts in each experiment. There is a clear boost to the success 
rate of the attack when moving to the refined keystream distribution estimates. 
The effect is particularly pronounced in the odd positions. 

As noted in Section [3j we discovered significant TSCi -dependent, single-byte 
biases in the RC4 keystreams for WPA/TKIP keys well beyond position 256. 
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Fig. 4. Success rates of single- byte plaintext recovery attack against TKIP/WPA for 
positions 1 to 256 with 2 24 ciphertexts, using keystream distribution estimates based 
on 2 24 keystreams (red) and 2 32 keystreams (blue) per TSC (success rates based on 256 
experiments) 


The biases are roughly comparable in size to the single-byte biases seen in RC4 
keystreams at positions around 250 for random 128-bit keys (as used in TLS and 
reported in mi So we might expect to obtain reliable plaintext recovery with 
around 2 30 - 2 32 ciphertexts as in [IJ. The full investigation of this avenue is left 
to future work. 

5 Plaintext Recovery Attacks for WPA/TKIP Based on 
Double-Byte Biases 

Our double-byte bias attack against WPA/TKIP builds on the attack in PQ, 
and works in the same setting as the above described single-byte bias attack: 
the same plaintext is assumed to be encrypted many times under different RC4 
keys, yielding a set of ciphertexts which is given as input to the attack algorithm. 
However, as opposed to the attack based on single- byte biases, the attack does 
not estimate the likelihoods of the individual plaintext bytes (or plaintext byte 
pairs). Instead, the basic idea of the attack is to estimate likelihoods of sequences 
of plaintext bytes by considering chains of overlapping plaintext byte pairs in 
combination with the double-byte biases in the keystream. 

More precisely, the attack will construct likelihood estimates for sequences 
of plaintext bytes that are gradually increasing in length by extending already 
established sequences and their corresponding likelihood estimates. This is done 
as follows: consider a sequence of plaintext bytes with an already estimated like- 
lihood, and a candidate for the next plaintext byte in the sequence. By XORing 
the pair consisting of the last plaintext byte of the existing sequence and the 
new candidate plaintext byte with the ciphertext byte pairs for the correspond- 
ing positions, an induced distribution on the keystream byte pairs is obtained. 
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By comparing this to the known double-byte keystream distribution, a likelihood 
estimate for the new candidate plaintext byte can be computed; combining this 
with the likelihood estimate for the initial plaintext sequence, a likelihood esti- 
mate for the extended sequence can be obtained. 

Note that, using a naive algorithm, the complexity of computing the likelihood 
estimates for all possible plaintext sequences would grow exponentially in the 
length of the sequences. Furthermore, considering all possible candidates for the 
next plaintext byte, but only maintaining a small set of the most likely sequences 
after each extension, is not guaranteed to produce a plaintext byte sequence 
that maximises the value of the estimated likelihood. However, as highlighted in 
PQ, by tracking which sequences produce the maximum value for the estimated 
likelihood for each possible value of the last byte in the sequence, the overall 
plaintext sequence which maximises the likelihood estimate is guaranteed to be 
found. 

Compared to the algorithm from [T] , the algorithm presented here provides 
two refinements made possible by the specific way RC4 is used in WPA/TKIP. 
Firstly, as in the attack described in Section [U likelihood estimates are computed 
on a per-TSC basis, and combined across all TSC values to obtain improved overall 
likelihood estimates. Secondly, the attack not only exploits the per-TSC double- 
byte biases in the WPA keystream, but also takes into account the single-byte 
biases in the computation of the likelihood estimates. A more detailed description 
of the algorithm is given next. 

To run the algorithm, accurate estimates of both the single-byte and double- 
byte keystream distributions are required for all positions r the attack is tar- 
geting. By considering the statistics gathered by generating a large number of 
keystreams, we estimate 

Pfsc,r,fe : = Pr (^r = k), and Prsc >rjfel)ft2 := Pr (Z r = k x A Z r+1 = k 2 ) 

where TSC G 7 scSp, k , k\, G Byte, and the probability is taken over a random 
choice of RC4 key subject to the structure on Ko, Ki, K 2 induced by TSC. 

As in the single- byte bias attack, we suppose we have S ciphertexts available 
for our attack, and that, when grouped according to TSC values, each group 
contains exactly T = S/2 16 ciphertexts. We likewise use the notation Gfsc,j,r 
to denote the ciphertext byte at position r in the j th member of the group of 
ciphertexts for the value TSC. 

For a given position r, we can now use a similar approach to the single- 
byte bias attack to compute the likelihood of a candidate byte y or a candi- 
date byte pair (/x, y f ) (at position (r, r + 1)) corresponding to the encrypted 
plaintext byte or byte pair. More specifically, the vectors (N^ r kl ) kie g yte anc ^ 

( ^ j where 

V V TSC,r,/ci ,k 2 ) k u k 2 eByte 5 Wlita e 


N mr, kl = It? e I 1 - T ] I C TSC,j,r = kl® aOI 

= It? ^ I?- •• U I (Cfs C,j,ri GsCJ,r+l) = (^1 ffi M; k‘2 © /J )}| 
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represent induced distributions on the keystream byte Z r and keystream byte 
pair (Z r ,Z r+ i), respectively. Indeed, the probability that plaintext byte /i is 
encrypted at position r, which we will denote a^\ and the probability that 
(/i, //) is encrypted at position (r, r + 1), which we will denote (3 \ can be 
computed as: 




= n n (ptsc.^iA 


+0 

TSC ,r,k 1 


TSCeTscSp kieByte 


) - JJ p (PfSC,r,ki,k 2 ) 

TSCeTscSp kx,k 2 eByte 


jy (Mi + ) 

TSC,r, k\ , k2 


However, as highlighted earlier, instead of using the above probabilities for 
individual plaintext byte and byte pairs directly, we use these to construct like- 
lihood estimates for longer sequences of plaintext bytes by considering chains 
of overlapping byte pairs. More specifically, consider a plaintext byte sequence 
fi i || • • • || /i r for positions 1 to r with an already established likelihood esti- 
mate A /Xl ||...|| /Xr , and a plaintext candidate byte /i r+ i for position r + 1. Then we 
estimate the likelihood of the plaintext byte sequence fi i || • • • || /i r +i as: 

Vll-ll/v-H = <+’^ +l) • Vll-ll/v (3) 

where <5+ ,fJ,r+ 1 1 denotes the conditional probability that // r+1 is the plaintext 
byte at position r + 1 given that the plaintext byte at position r is fi r . Note that, 
by the definition of conditional probability, we can compute yj+ r ’^ r+1 ) based on 
the estimates and as 

£+ r ,/X r+ i) _ p(jl r ,p r+ 1) j a (Vr)' 

In the description of the attack algorithm presented here, it is assumed that 
the plaintext byte P* at position r = 1 is known. This serves as a starting 
point for the algorithm, i.e., the algorithm is initialized with a single plaintext 
sequence containing the byte value P* for position r = 1 and with the estimated 
likelihood A p* = 1. Now, using the above described method for extending a 
plaintext byte sequence and the corresponding likelihood estimate, the attack 
algorithm iterates over the range of considered positions as follows. For each 
position r, and for all possible values /++i of the plaintext byte at position r + 1, 
the extension with /i r+ i of each of the sequences from the previous iteration is 
considered, and, for each of the possible values of + r +i, the algorithm stores the 
“most likely” extended sequence having /i r+ i as the last byte value (that is, it 
stores the extended sequence which maximises the likelihood estimate expressed 
in equation (j3j)). When the attack algorithm reaches the last position, it simply 
returns the sequence with the highest likelihood estimate. 

Note that this process is guaranteed to find the plaintext byte sequence with 
the highest likelihood estimate computed according to equation ©• However, 
we emphasise that this expression yields only an approximation to the actual 
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plaintext likelihood, being based on the twin assumptions that plaintext bytes 
are independently and uniformly distributed and that keystream bytes have no 
dependencies beyond those in adjacent bytes as expressed in the double- byte 
distributions. 

A full description of the attack algorithm is given in Algorithm!!] (on page 14 15]) . 
Note that the algorithm can easily be extended to work for the case where the 
plaintext byte at the initial position is unknown. In particular, by exploiting 
the single- byte biases, the likelihoods of ah possible values of the initial plain- 
text byte can be estimated, and subsequently used as a starting point for the 
algorithm. Of course, the algorithm need not start at position r = 1 either. 

Notice that the algorithm involves heavy nesting of loops, particularly in 
phase 2b, where for each position r we perform a computation over ah possible 
values for the candidate plaintext byte pair (/i r _i,/r r ), each such computation 
itself involving a sum over 2 32 pairwise products of real numbers arising from 
the triple summation over TSC, k\ and k 2 . Thus a direct implementation of 
this algorithm would require on the order of 2 48 additions and products per 
position ! This would be inconvenient, to say the least. For this reason, and 
because our double-byte, per-TSC keystream distributions are not particularly 
accurate (being based only on 2 30 keystreams each) , we would in preference use 
aggregated versions of the algorithm. Specifically, building on our experience in 
the single-byte case, we may consider a version of the algorithm that works with 
TSCo-aggregated distributions and only works on a per-TSCi basis. It is not hard 
to see how to modify Algorithm [4] to operate in this way, saving a factor of 2 8 
in its computational cost. The algorithm could be further modified to use fully 
aggregated distributions, saving another factor of 2 8 in computational cost, but 
now effectively ignoring any TSC-related structure in the keystream distributions. 

We have performed a very limited validation of our double-byte attack in its 
fully aggregated form. A complete evaluation of the algorithm and a comparison 
of its performance with the single- byte Algorithm [3] is deferred to the full version 
of the paper. We make one observation at this stage, however. Algorithm [4] 
makes use of ratios of probability expressions of the form / a[k r \ where 

the numerator is a double-byte probability and the numerator is a single-byte 
probability. If the significant biases in the former probabilities actually arise from 
products of single- byte biases for adjacent positions, then such expressions can 
be simplified to just single- byte probability terms of the form Q ,(^_ r + 1 ) ? i n effect 
reducing our double-byte attack to our single-byte attack. Such behaviour can be 
expected in early byte positions, where single-byte biases are very large. Thus we 
do not expect our double-byte attack in Algorithm [4] to significantly out-perform 
our single-byte attack in the early positions. On the other hand, in regions where 
single-byte biases become smaller and fewer in number but double-byte biases 
still persist (as seems to be the case in later positions), then Algorithm [4] may 
be expected to perform better than our single- byte attack. Indeed, Algorithm!!] 
should be able to smoothly interpolate between regions where single-byte biases 
dominate and regions where they do not. 
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Algorithm 4. Double- byte bias attack 

input : C - balanced vector of 2 16 • S encryptions of fixed plaintext P 

(CW ,j, r denotes r-th byte of j - th encryption of P for TSC-value TSC) 

L - length of P in bytes 
mi and mp - known first and last byte of P 
{Pm,r,k}im&TscSp,i<r<L,kdByte ~ single-byte key distribution 
{Prsc,r,/= 1 ,fc 2 }Tsc€T5c5p,i<r<i,,/= 1 ,fc 2 €eyte “ double-byte key distribution 
output: estimate P* for plaintext P 
begin 

JVfgc , r ,k 0 for TSC £ TscSp, 1 < r < L, k e Byte 

^Tsc,r,fci,fc 2 0 for a H TSC ^ TscSp , 1 < r < L, fci, 6 Byte 
initialise mappings Q,Q ' : Byte —> Byte * x M 

// Phase 1 (count occurrences of keystream bytes and byte pairs) 

for each TSC £ TscSp do 
for j m 1 to S do 

for r = 1 to L — 1 do 

^TSC.r t- N mt r, C - j r + 1 

_ ^ TSC ’ r ’ ( -^TSC,j,f’ ( ^T^C, 2 ,r+l ^ ^ TSC,r, ^TSC,j,r'’ < ^TSC,j,i’+l ^ 

// Phase 2a (derive likelihoods for plaintext byte at position 2) 
for p 2 = 0x00 to OxFF do 

^miH/u.2 ^ “I” Axsc,l,fei®mi ,k 2 Q)p 2 logPTSC,l ,/ei ,k 2 

TSCG75c<Sp k lt k 2 eByte 

— XI X] ^TSC,l,fc®mi l°gPfSC,l,/e 

TSCG75c<Sp keByte 

L (^2, A mi || M2 ) 

// Phase 2b (derive likelihoods for plaintext bytes at positions 3. . . (L — 1)) 

for r = 3 to L — 1 do 

for fi r = 0x00 to OxFF do 

L 4 — — oo 

for fi r - 1 = 0x00 to OxFF do 

parse Q\fi r - i] as (?', Ap') 

Ap'|| AXr Ap/ 

+ _E E ^TSC,r-l,/ei®AX r _i,fc 2 ®Atr logPfsc ,r— 1,/ci ,/e 2 

TSCe7scSpk 1 ,k 2 <EByte 

~ £ E ^TSC.r-l,*:®^! logPfSC.r^fc 

TSCe7sc<Sp keByte 

if Ap/|| AXr > L* then 
L (P*,L*) <- (P',A P1 „J 
_ (P* || fl r: L *) 

_ Q Q' 

// Phase 3 (pick most likely 
L 4 — — oo 

for fiL - i = 0x00 to OxFF do 

parse Q[/il- i] as (P', Ap') 

Ap'llmx, Ap' 

+ _E E 
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(a) 128-bit MPPE keys (b) 40-bit MPPE keys (c) 56-bit MPPE keys 


Fig. 5. Pictorial representation of biases in RC4 keystreams for 128-bit, 40-bit and 
56-bit MPPE keys, for different positions (x-axis) and byte values (y-axis). For each 
position we encode the bias in the keystream for the (posit ion, value) combination as a 
colour; in each case, the colouring scheme encodes the absolute biases, i.e., the absolute 
difference between the occurring probabilities and the (expected) probability 1/256, 
scaled up by a factor of 2 16 , capped to a maximum of 0.5. 


6 MPPE 

6.1 Computing Keystream Distributions for MPPE Keys 

We also computed the RC4 keystream distributions for the first 256 keystream 
bytes using MPPE keys having the structure described in Section 12.31 More 
specifically, we generated random 8-byte random keys K = (Ko, . . . , K7) and then 
overwrote key bytes according to the MPPE specification for the 40-bit and 
56-bit cases, while in the 128-bit case, we generated random 16-byte keys. We 
used more than 2 39 keys in each case, with all computations being performed 
on our local computing facilities. Figure [5] compares the distributions obtained 
for random 128-bit RC4 keys (as used in 128-bit MPPE and in TLS) with those 
for 40-bit and 56-bit MPPE keys. As can be seen, the process of fixing certain 
key bytes to constant values produces many additional, strong biases in the 
corresponding keystreams. 


6.2 Attack Simulation Results 

We used the MPPE keystream distributions to simulate plaintext recovery at- 
tacks using the algorithm of pQ, equivalent to the fully aggregated version of 
Algorithm [3j The results are depicted in Figured As expected, the additional 
structure in RC4 keys introduced by MPPE in the 40-bit and 56-bit cases sig- 
nificantly aids plaintext recovery, with 40-bit keys leading to the highest success 
rate for a given number of ciphertexts. We also experimented with random 64-bit 
keys, finding success rates very close to the random 128-bit case. This indicates 
that it is not the reduction in key-size that makes the difference in MPPE, but 
rather the introduction of fixed key bytes. 
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Fig. 6. Average success rates of single- byte plaintext recovery attacks against MPPE 
using 40-bit keys (blue), 56-bit keys (red), and 128-bit keys (green) over positions 1 to 
256. The success rates are based on 256 experiments. 

7 Conclusions 

In this paper, we have explored the use of cloud computing facilities to perform 
large-scale computations in support of the cryptanalysis of WPA/TKIP. We 
expended 63 virtual- core- years of computational effort at a cost of US$43k to 
carry out two computations, one involving 2 48 keystreams to estimate per-TSC 
single-byte distributions, the other involving 2 46 keystreams to estimate per- 
TSC double-byte distributions. The total amount of computation was roughly 
one-twentieth of that used in the sieving stage for the factorisation of RSA- 
76£@. The problems of developing efficient code for, and then managing, these 
computations were not insignificant but ultimately surmountable. This suggests 
that commercial cloud services can be used as a platform for this kind of work, 
instead of relying on owned infrastructure. Certainly, running 2 13 hyper-threaded 
cores in parallel was an exhilarating, if expensive, way to explore the limits of 
commercial cloud computing capabilities. 

The value of our keystream distribution computations for WPA/TKIP is aptly 
illustrated in Figured! which shows the marked improvement in success rate that 
accrues from moving from single-byte keystream distribution estimates based on 
2 24 keystreams per TSC to 2 32 keystreams per TSC. Our computations of RC4 
keystream distributions in WPA/TKIP and MPPE also provide experimental 
data that may be useful in making hypotheses about keystream biases, and 
which may in turn lead to a better theoretical understanding of the operation 

8 Estimated at 1500 core- years for a single core 2.2 GHz AMD Opteron processor with 

2GB RAM in [9] . 
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of RC4 in these applications. Certainly, having an explanation for the long-lived 
TSCi-specific single-byte biases that we observed experimentally would be very 
welcome. A similar project would investigate the effect of fixing key bytes in RC4 
keys, and apply the results to provide a theoretical explanation for the observed 
biases in MPPE keystreams. 

Our attack on WPA/TKIP based on double-byte biases requires further in- 
vestigation: the time and budget available for this project has limited our ex- 
perimentation with it and reduced our investment in its fine-tuning. Given the 
dominance of single-byte biases in early portions of the RC4 keystreams for 
WPA/TKIP, we expect this algorithm to come into its own when targeting re- 
peated plaintext that is located later in WPA/TKIP frames (e.g. after position 
256). Moreover, it provides a mechanism for smoothly transitioning attacks from 
the regime where single-byte biases dominate to the regime where these biases 
are no longer apparent but where double- byte biases are still present. It remains 
to investigate whether other forms of bias (such as the “ABSAB” biases from 
031 ) can be effectively integrated into a more general Bayesian approach, and 
how much impact this might have on overall attack performance. 
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Abstract. In this paper, we investigate the multi-user setting both in 
public and in secret-key cryptanalytic applications. In this setting, the 
adversary tries to recover keys of many users in parallel more efficiently 
than with classical attacks, i.e ., the number of recovered keys multiplied 
by the time complexity to find a single key, by amortizing the cost among 
several users. One possible scenario is to recover a single key in a large set 
of users more efficiently than to recover a key in the classical model. An- 
other possibility is, after some shared precomputation, to be able to learn 
individual keys very efficiently. This latter model is close to traditional 
time/memory tradeoff attacks with precomputation. With these goals in 
mind, we introduce two new algorithmic ideas to improve collision-based 
attacks in the multi-user setting. Both ideas are derived from the paral- 
lelizable collision search as proposed by van Oorschot and Wiener. This 
collision search uses precomputed chains obtained by iterating some ba- 
sic function. In our cryptanalytic application, each pair of merging chains 
can be used to correlate the key of two distinct users. The first idea is 
to construct a graph, whose vertices are keys and whose edges are these 
correlations. When the graph becomes connected, we simultaneously re- 
cover all the keys. Thanks to random graph analysis techniques, we can 
show that the number of edges that are needed to make this event occurs 
is small enough to obtain some improved attacks. The second idea mod- 
ifies the basic technique of van Oorschot and Wiener: instead of waiting 
for two chains to merge, we now require that they become parallel 
We first show that, using the first idea alone, we can recover the 
discrete logarithms of L users in a group of size N in time 0(y/NL). 
We put these two ideas together and we show that in the multi-user 
Even-Mansour scheme, all the keys of L — N 1 ^ 3 users can be found with 
iV 1 / 3+e queries for each user (where N is the domain size). Finally, we 
consider the PRINCE block cipher (with 128-bit keys and 64-bit blocks) 
and find the keys of 2 users among a set of 2 32 users in time 2 65 . We also 
describe a new generic attack in the classical model for PRINCE. 


P. Sarkar and T. Iwata (Eds.): ASIACRYPT 2014, PART I, LNCS 8873, pp. 420 [4381 2014. 
(c) International Association for Cryptologic Research 2014 
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1 Introduction 

The multi-user setting is a very interesting practical scenario, which is sometimes 
overlooked in cryptography. Indeed, cryptosystems are designed to be used by 
many users, and usually cryptographers prove the security of their schemes in 
a single-user model except in some cases such as key exchange, public-key en- 
cryption and signatures. At EURO CRYPT 2012, Menezes [20j gave an invited 
talk pointing out the discrepancy between security proofs for message authen- 
tication code in the single-user and in the multi-user setting. As it was already 
been pointed out in m, he showed that there is a straightforward reduction 
between the security proof for one user and the security proof for L users with 
a success probability divided by L. Next, he recalled the key collision attack 
due to Biham [3] that matches this bound and that can be applied on various 
deterministic MACs (CM AC, SIV, OCB, EME, . . . ). In this attack, the adver- 
sary asks the MAC tag of a single message M for L different users; we call this 
the set of secret MACs. Then, for a subset W of size N/L of known keys (N 
is the key size), he computes MAC(k,M) for all k G W and builds the set of 
public MACs. If a collision occurs between the public and secret sets, then we 
learn one of the L secret keys0 For MAC schemes with an 80-bit security level, 
it is possible with time/memory tradeoff to make this reasonably practical and 
derive a key recovery of a single key among a set of L = 2 20 users, using time 
and memory 2 40 . Menezes thus insists that cryptographers have to consider this 
practical setting when devising or analyzing cryptosystems. For more results on 
multi-user attacks, the reader can also refer to [4]. 

In this paper, we are interested in collision-based attacks [24] in the multi- 
user setting. We rely on the distinguished point technique to propose new attacks 
on the generic discrete logarithm problem, on the Even-Mansour cipher and on 
PRINCE. Collision-based methods have been nicely improved by van Oorschot 
and Wiener to become parallelizable using the distinguished point technique of 
Rivest and Quisquater and Delescaille [22]. Here, we extend these methods and 
apply them to cryptanalysis in the multi-user setting. 


Our Contributions. From a cryptanalytic point of view, there are many ways 
to perform attacks in the multi-user setting. In this paper, we are interested by 
several scenarios. The first option is to recover all the users’ keys (or a large 
fraction thereof) in time less than the product of the number of users by the 
time complexity to recover one key. Another direction is to improve Biham’s 
attack and recover a single key in the multi-user setting with a reduced memory 
cost. Finally, we consider time/memory attacks starting with a precomputation 
whose result can then be used later to recover individual keys much faster. 

Giant connected component. The multi-user setting for the discrete logarithm 
problem has been studied by Kuhn and Struik in m They show that it is 


1 Provided that the tag length is greater than the key length. 
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possible to adapt the parallel version of the Pollard rho technique with distin- 
guished points to recover L keys in time V NL where N is the size of the group 
as long as L <C VN . In the parallel version of Pollard rho method described by 
van Oorschot and Wiener, we run random walks in parallel, stop them once a 
distinguished point is reached and store this value for many starting points. We 
get a public set of distinguished points for the walks that begin at y a = g a for 
which we know a and a secret set from a user public key y for starting points yg h 
where b is known. Kuhn and Struik generalize this method by using many secret 
sets, one for each user. Once a distinguished point appears twice in the public 
and secret sets, the discrete logarithm of one user can be discovered, and conse- 
quently, we also know the discrete logarithm of all the distinguished points that 
were discovered during the random walks for this user. Therefore, as the number 
of “known” points increases, the probability of a collision between a secret point 
and a known one becomes higher. Similar results can be found in mm- 

Here, we show another method that works without any restriction on L and 
keeps the symmetry between all read points. Indeed, we do not have to wait until 
the first collision between a public point and a secret one happens, but we also 
consider collisions between secret points. More precisely, as soon as a collision 
between the public walks and the secret walks happens, we learn many discrete 
logarithms, since when two secret chains collide, we learn the difference between 
the discrete logarithm. We can then construct a graph whose vertices are the 
users and we add an edge if we know the difference of the discrete logarithm 
between these users. At some point, when the number of edges becomes slightly 
larger than the users, a giant component emerges in our random graph and if 
the public user is in this component (with high probability in time 2 L In L), then 
the discrete logarithm of all users will be known. 

Our method has an advantage towards the method proposed by Kuhn and 
Struik as we use parallelism extensively. However, a disadvantage is that in our 
case we do not learn any discrete logarithms until the very end, when a giant 
component appears in the graph. In contrast, Kuhn and Struik’s algorithm is 
sequential and so they find each discrete logarithm one after the other. Overall, 
the main goal of section 2 is to provide an educational example of the graph 
connexity approach and show that it is much simpler to analyze. 

Lambda Method for two different Even-Mansour style functions. We were also 
able to apply similar techniques on Even-Mansour with domain size N. Indeed, 
using some functions related to the encryption scheme, we show that we can learn 
the Xor between the keys of two users. The previous technique can also be used 
to recover the keys of all users. However, in this case, we get a new problem: the 
two functions we iterate are no longer the same. Consequently, contrary to the 
DL case, once a collision appears, the chains will no longer merge and we cannot 
use distinguished point technique. To solve this issue, we tweak the two functions 
and define related functions that will no longer merge but become parallel. We 
show that this parallel method is as efficient as the previous one. For instance, 
we show an attack that partially solves an open problem of Dunkelman et al. 
that asked to find a memoryless attack on Even-Mansour with D queries to the 
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secret function and T = N/ D to the public function with D <C y/N. We propose 
an attack that matches these bounds ( D = 7V 2 / 5 ,T = TV 3 / 5 ) but where the 
memory is TV 1 / 5 as an application of our lamb da- met hod. Furthermore, we also 
describe a multi-user attack which allows to learn all the keys in a set of TV 1 / 3 
users in data complexity A/' 1 / 3+e to each user and T = A/' 1 / 3+e time complexity 
by combining the two algorithmic tools. This attack exhibits new tradeoff where 
the amortized data complexity per user times the time complexity is reduced to 
7V 2 / 3 + e instead of N. 

Application to PRINCE. PRINCE cipher [7] is a new block cipher recently 
introduced at ASIACRYPT 2012 with blocklength 64 bits and key length 128 
bits. Its design has a ^-reflection property which is a related-key relation that 
transforms the decryption algorithm to the encryption process with a related- 
key. Here, we propose generic attacks on the full number of rounds. At FSE 
2014 [8], an attack on 10 rounds of PRINCE has been presented, with time 
complexity 2 60,7 and data complexity 2 57,94 . In [15], an attack with slightly less 
than 2 128 allows to break all the rounds, but our attacks have a particular low 
time complexity. They are similar to the one on Even-Mansour but we have 
to take into account that in PRINCE, the internal permutation uses a secret 
key. They make use both of the ^-reflection property and of the specific key 
scheduling of PRINCE, i.e. the relationship between the two whitening keys. 
The first attack allows to recover the keys of two users among a set of 2 32 users 
in time 2 65 and the second one allows to recover the keys of all users in time 
2 32 after a precomputation of time 2 96 and 2 64 in memory. Finally, we do not 
contradict the security bound showed in the original paper, but we show that 
different tradeoffs are possible. 

Organization of the Paper. In section 2, we present our results on the dis- 
crete logarithm problem in the multi-user setting and we use the properties of 
random graph in this setting. Then, we present various results concerning the 
security of Even-Mansour: new time/memory /data (denoted T/M/D) tradeoffs, 
new time/memory (denoted T/M) attack solving the open problem of Dunkel- 
man et al. and in the multi-user setting. In this part, we show how we can adapt 
the lambda- method when searching for collisions for two different functions based 
on the Even-Mansour idea. Finally, in the last section, we present various generic 
attacks on the PRINCE block cipher, one in the multi-user setting and the other 
in the classical model. 

2 Discrete Logarithms in the Multi-user Setting 

In this section, we present a new algorithmic idea for performing T/M attacks 
with distinguished points in the multi-users setting. Our technique allows to 
compute the discrete logarithms of L public keys yi = g Xi for i = 1, . . . ,L in 
time! 0{y/NL ) for any value of L where N = \(g)\- Starting from the parallel 

2 The O notation hides logarithmic terms. 


424 


P.-A. Fouque, A. Joux, and C. Mavromati 


version of Pollard rho method [24|, we compute cL/2 chains consisting of pseudo- 
random walks from yi (c/2 chains for each user by randomizing the starting 
point) until we discover a distinguished point di G So where So denotes the set of 
distinguished point^j. Then, all distinguished points found are sorted and each 
collision between the distinguished points of different users di and dj reveals 
a linear relation between Xi and Xj. We also compute a few chains starting 
from random points for which the discrete logarithm is known g x °. Finally, we 
construct the random graph where the vertices are the public keys and we add 
an edge between yi and yj if we have a collision between di and dj (this process 
can be described more formally using a random graph process). This edge is 
labelled with the linear relation between Xi and Xj. Once we have computed a 
sufficient number of collisions, a small constant time the number of users, then 
a giant component will appear with high probability. More precisely, in a graph 
with L vertices and cL/2 randomly placed edges with c > 1, there is a giant 
component whose size is almost exactly (1 — t(c))L, (see [6]) where: 


1 oo 

«(<=) = 7 £ 


k = 1 


k k -\ce~ c ) k 

k\ ‘ 


For c = 4, we get 1 — t(c ) = 0.98. The discrete logarithm of all the points in the 
component of the xo’s are known. If we want to recover the discrete logarithm 
of all users with overwhelming probability, we need 2LlnL edges to connect 
all connected components according to the coupon collectors problem and not 
cL/2, as it is recalled in Theorem 2. 

Let i the average length of the chains and So the set of distinguished points. 
The average length of each chain is i = 7V/|So|- Assume we have computed i 
chains that do not collide, the probability that the (i + l) th chain collides with 
one of the previous is it x £/N. Consequently, the expected number of collisions 

Coll is: 


L - 1 

E[Coll] = Y, 




L 2 e _ L 2 (N/\So\) 2 
2 ' N ~ 2 N 


L 2 N 


We want the number of collisions to be larger than cL/ 2, which implies 
L 2 A'/2|5 'o| 2 > cL/2, thus \Sq\ < y/LN/c. Consequently, the overall cost is dom- 
inated by the computation of the chains, i.e. Lx N/\So\ which is about y/cLN 
if | So | = y/LN/c. Finally, in order to have cL/2 edges in our graph, each user 
has to compute a small number of chains using a small number of random input 
points of the form g Xi + ri for known value of . The overall complexity of our 
attack is 0(y/NL) for any value of L while Kuhn and Struik analysis achieves 
the value y/2LN for L <C t/AL 

Another possible approach to analyze known, unknown points and collisions 
between them would be to use a matrix. For this, we consider a symmetric matrix 
M where M[z, j] represents the linear relation between the discrete logarithms of 
i and j. Then we apply a random variable in order to sparsify the matrix. More 


3 This algorithm can also be adapted to the Pollard- lamb da algorithm 21 . 
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precisely, we multiply the coefficient (i, j) of the matrix by 1 with probability 
p and by 0 with probability (1 — p), where these probabilities are independent. 
When we multiply by 1, that means, that we know the differences between the 
discrete logarithms of i and j.The question then becomes how many rows (with 

2 non-zero coefficients) do we need to achieve full column-rank, which naturally 
leads to the same results: 0(7 * log(7)). However, when considering rows with 
0(log(7)) non-zero coefficients, we only needs 0(7 ) rows. This would imply 
that for multi-user discrete logarithms the overall complexity can be reduced 
by a factor log(7) to 0(sqrt(L * N )) by spending a factor log(7) more work in 
generating starting points of random combinations of log(7) known/unknown 
points (e.g., see El). We choose to analyze the complexity in the same form as 
Wiener and van Oorschot which is usually the case for crypto papers, i.e we do 
not care on the log N factors that arise in such birthday algorithms. Indeed, the 
Kuhn and Struik algorithm hides also a log (AT) factor in order to get collisions 
with very high probability because a 1/2 probability is not sufficient since we 
need many collisions of this type. 

3 Even-Mansour in the Single and Multi-user Settings 

3.1 Brief Description of Even-Mansour 

At Asiacrypt 1991, Even and Mansour in [14] describe a very efficient design 
(called EM in the following) to construct a block cipher, i.e. a keyed permutation 
family IJk 1 ,k 2 from a large permutation i r. The key K\ is first xored with the 
plaintext, then the fixed permutation is applied and finally the key K 2 is xored 
to obtain the final value. 


n Ku K 2 (P) = n(P 0 Ki) 0 K 2 . 

Their main result is a security proof that any attack that uses D on-line plain- 
text/ciphertext pairs (queries to 17) and T off-line computations (queries to i r) 
must satisfy DT = IV, where N = 2 n with n the size of the plaintext and key 
and which will be called the EM curve. The important part of the proof is that 
it is a lower bound for all attacks including known-plaintext attacks. It appears 
that the use of two keys K\ and K 2 does not add much more resistance to the 
scheme. This variant of using K = K\ = K 2 has been proposed under the name 
Single-Key Even-Mansour and we denote it by TZ^. The security of this minimal 
version has been proved secure with the same bound as for the two-key version 
by Dunkelman et al. This minimal version is amazingly resistant and guaran- 
tees the same security bound, but it is not unexpected since usually the attacks 
look for the two keys independently and once the key K\ is recovered, there is 
no security for K 2 . In the following, we see that the two-key version does not 
improve the security since most of the attacks on the single-key can be levered 
to this version. 

In this section, we describe new results concerning the security of the Even- 
Mansour scheme which has recently been the subject of many papers PPl - 
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We recall the basic attacks and then, we present a basic T/M tradeoff for known 
plaintext attacks with better on-line complexity (Sect. 13. 3ft and a better T/M 
tradeoff for adaptive queries fSect. lT4|) . For this attack, we introduce our second 
algorithmic trick to discover collisions for two different functions based on the 
Even-Mansour construction. The main difficulty we have to solve is that when 
a lambda-like method is used to recover collisions, if two different functions are 
used, after the collision, the chain will no longer merge. To this end, we adapt 
the lambda- method to have parallel chains when the collision happens. Finally, 
we show that in the multi-user setting (Sect. I3.5j) the precomputation cost can 
be amortized. It is possible to balance all the complexities to recover all the keys 
of TV 1 / 3 users with A/' 1 / 3+e adaptive queries to each user, a precomputation time 
of 7V 1//3_l_e and the attack requires A/' 1 / 3+e in memory and A/' 1 / 3+e for the on-line 
time. 


3.2 Previous Attacks on Even-Mansour 

In [12], Daemen showed that the EM curve TD = A, is valid for a known 
plaintext attack at the point (T = N/ 2, D = 2). He also gave a chosen-plaintext 
attack that matches the EM curve for any value of D and T and in particular 
at the point (T = A 1 / 2 ,!) = A 1 / 2 ). Later, Biryukov and Wagner described a 
sliding attack that matches the EM curve for known-plaintext but only at the 
point (T = A 1 / 2 ,!) = A 1 / 2 ). Recently, Dunkelman et al. introduce a new twist 
on the sliding attack whose complexities match the whole curve for any value 
of D and T using a known-plaintext attack which is exactly the result proved 
by Even and Mansour. Finally, Dunkelman et al. also provide a slidex attack on 
the two-key Even-Mansour scheme. 

Simpler collision-based attack on the Single- Key Even-Mansour. In the single- 
key case a simpler attack achieves the same performance. The basic idea is to 
apply the Davies-Meyer construction to II and to i r. More precisely, write: 

Fn(x ) = 77(x)®x and F n (x) = i r(x)®x. 

For any value of x, the equality Fn(x) = F n (x®K) is satisfied. Moreover, any 
collision between these two functions Fn(x) = F^{y) indicates that x(By is a 
likely candidate for the key K. 

With this idea in mind, the problem of attacking the single key Even-Mansour 
scheme is reduced to the problem of finding a collision (or rather a few collisions) 
between Fn and F n . The simplest approach is simply to compute F n on T 
distinct random values and Fn on D distinct random values. When DT ~ A, 
one expects to find the required collisions. 

Moreover, this can be done in a more efficient way by using classical collision 
search algorithms with reduced memory. Indeed, it is possible to use Floyd’s cycle 
finding algorithm to obtain such a solution for the special case D = T = A 1 / 2 , 
without using memory. However, in this case the attack is no longer a known- 
plaintext attack and becomes an adaptively chosen plaintext attack. 


Multi-user Collisions and Applications 


427 


Dunkelman, Keller and Shamir ask whether it is possible to generalize this 
and to find memoryless attacks using D queries to 77 and N/D to i r where 
D « N 1 / 2 ? 

In this paper, we partially answer this question, proposing attacks that use 
less than D <C TV 1 / 2 data and memory lower than min(T, D ) if we require the 
unkeyed queries to be precomputed. Without this requiring, we achieve a mem- 
oryless attack. 


3.3 Extending the Simple Attack 

Dealing with two keys Even-Mansour. A first important remark is that the simple 
attack on Single-Key EM can be extended to the two-key case. The idea is simply 
to replace the function 7r(x)®x by another function with similar properties. 
A first requirement is that the chosen function needs to be expressed by two 
different formulas, one based on i r and the other on 77. The other requirement is 
that a collision on two evaluations, one of each type, should yield good candidates 
for the keys. 

We now construct the required function and show that the simple attack on 
the single- key variant can be extended to two keys. We first choose a random 
non-zero constant S and let: 

Fn(x) = 77(x)®77(x®£) and F n (x) = 7r(x)®7 t(x®5). 

We remark that Fn(x ) = F 7r (x^Ki) and that Fn(x®5) = F 7r (x^Ki) are both 
satisfied. As a consequence, every collision now suggests two distinct input keys 
Ki = x(&y and Ki = x(&y(BS. Except for this detail, the attack remains un- 
changed. Note that once Ki has been found, recovering K 2 is a trivial matter. 

Reducing the on-line time complexity. In this section, we focus on known-plaintext 
attacks and we first show that the EM security model does not separate the on- 
line and off-line time complexities, as usually done in T/M/D tradeoff. It is then 
possible to use T/M/D tradeoff for this blockcipher design as suggested in [5] by 
Biryukov and Shamir. 

Let us separate the on-line time denoted by T on and the off-line time denoted 
by T 0 ff. Clearly, the total time complexity T is T on + T 0 ff. 

The main idea of this section is to use a different approach to find a collision 
between Fjj and F n . More precisely, given a value of T/j, we try to invert F n 
on this value. If we succeed, we clearly obtain the desired collision. In order to 
inverse 7^, we rely on Heilman’s algorithm. The T/M/D tradeoff is 

T on M 2 D 2 = N 2 and D 2 < T on < N. 

In order to fully use Heilman tradeoff with multiple tables, we can use the S in 
the definition of the function F n (x) = 7 t(x) ® 7 t(x ® S) to define different and 
independent functions for each table. These attacks achieve T on D N while 
TD = N. 
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Using less data than memory. Despite its optimal efficiency in term of known- 
plaintext attack matching the EM curve, the Slidex attack presents an impor- 
tant drawback. Indeed, the public permutation 7 r needs to be evaluated at points 
which depend on the result of the queries to the keyed Even-Mansour construc- 
tion 77. As a consequence, with this attack, it is not possible to precompute the 
queries to 7 r in order to improve the online time required to obtain the key to 77. 

Our previous attack based on Heilman’s tables no longer requires adaptive 
queries, however, it is less costly than the Slidex attack in term of on-line time 
complexity but more costly than the simple collision-based attack (which uses 
adaptive chosen plaintext). The goal of the next subsection is to present an attack 
on 77, which is based on classical collision search algorithms and works by using 
queries to 7 r and 77 without any cross-dependencies. However, the queries to 77 
are adaptive but this new attack is more flexible to perform T/M tradeoff. 


3.4 Time/Memory /Data Tradeoff Attack on Even-Mansour 

Attacking Even-Mansour using distinguished points methods. In order to attack 
Even-Mansour using a distinguished point method, we would like to construct a 
set of chains using the public permutation 7 r and then find a collision with a chain 
obtained from the keyed permutation 77. One difficulty is that chains computing 
from 7 r and from 77 can never merge since they are based on different functions 
contrary to discrete logarithm section. We introduce here a new idea to solve 
this dilemma when the functions are based on the Even-Mansour construction. 
Let us define: 

Fn{x) = x®77(x)®77(x0^) and F 7T (x) = x®7r(x)®7r(x®^). 

We remark that Fn(x(&Ki) = F 7r (x)®77i. As a consequence, two chains based 
on Fn and F n cannot merge, but they may become parallel. Indeed, using the 
equation Fn{x(BKi) = F n (x)(BKi and let two points X and x such that X = 
x®77i, where X (resp. x) belongs to an Fn chain (resp. x belongs to an F n 
chain), the next element Y = Fn(X) in the Fjj chain and the next element 
y = F 7 T (x) in the F / chain will satisfy: 

Y = F n {X) = FnixQK!) = F^eKi = y®Ki. 

So Y = 2/®7fi, which means that Y and y satisfy the same relation as X and x, 
and so on. Therefore, as soon as by chance X = x®77i where X is an element 
of an Fn chain and x is an element of an F \ chain, the same relation remains 
with the subsequent points of the two chains, i.e. we get two parallel chains. 

Moreover, the detection of this good event is compatible with the distinguished 
point method. Indeed, it suffices to define a distinguished point x as a point with 
a value of 7r(x)®7 t(x®£) in So. Similarly, for chains constructed by using 7/j, we 
define a distinguished point X as a point with a value of 77(X)®77(X®£) in So. 
Now if X = £®77i and x is a distinguished point in a 7 r chain, then since 


77(X)®77(X®£) = 7r(X®7Fi)®7r(X®7Fi®£) = 7r(x)®7r(x®^) 
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the point X is also a distinguished point in the 77 chain, and therefore X®x 
gives a candidate for K\. Since the values 7r(x)®7r(x®^) and 77(X)®77(X®£) 
are needed to compute the next element in the chains, using this definition does 
not add any extra cost for distinguished point detection. The important point, 
is that for a parallel chain based on Fn, a point X = corresponds to a 

distinguished point x if and only if 77(X)®77(X®^) is in So- 

An important difference compared to the classical search for collisions is that 
we do not need to backtrack to the beginning of the chains and identify where 
the chains merge. Indeed, seeing parallel distinguished points suffices to get 
candidates values for K\. 

Analysis of the attack with precomputation. Since there is a clear symmetry 
between the keyed and unkeyed queries, we may assume that the number of 
unkeyed queries T is larger than the number of keyed queries D. Let Bt the 
number of unkeyed chains to increase the probability of a collision between 
keyed and unkeyed chains. Moreover, this is the most reasonable scenario, since 
keyed queries are usually the most constrained resource. In this case, we need 
to choose the expected length £ of the chains we are going to construct and Bt 
that satisfy the following relations: 

T = l- B t and N = B T i 2 . 

Thus, £ = N/T and Bt = T 2 /N. The required memory to store those chains is 
of size 0(B t ) ; 

After terminating the computation of the unkeyed chains, we can turn to the 
keyed side. On this side, we want to perform about D = N/T evaluations of the 
function. Since D = £, this means that we compute a single keyed chain and 
expect it to (parallel) collide with an unkeyed chain. 

We are interested in values for M such that M < D. Consequently, as M m 
T/D = N/D 2 , we have N < D 3 . Let us consider TV 1 / 3 < D = N a < TV 1 / 2 . For 
example, if D = AT 2 / 5 and T = TV 3 / 5 , then M = TV 1 / 5 is much smaller than 
TV 2 / 5 . This attack requires a number of data D <C TV 1 / 2 and despite this attack 
is not memoryless (as in the open problem), the memory is less than the data. 

Relaxing the precomputation requirement. Another alternative is to perform 
the same attack while computing the keyed queries before the unkeyed ones. In 
this case, since there is a single keyed chain to be stored, we can achieve the 
attack using a constant amount of memory. Moreover, this variation works for 
any D = N a < TV 1 / 2 using T = N/D. 


3.5 Attacks in the Multi-user Setting 

In the multi-user setting, we assume that L different users are all using the 
Even-Mansour scheme based on the same public permutation 7 r, with each user 

4 We thank an anonymous reviewer of Asiacrypt 2014 for pointing this out. 
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having its own ke£|, chosen uniformly at random and independently from the 
keys of the other users. 

Of course, the attack from Section [3~4l can be easily applied in this context. 
Depending on the exact goal of the cryptanalysis, we have two main options: 

1. If the goal is to recover the key of all users, the previous attack can be applied 
by repeating the D key-dependent queries for each user, while amortizing the 
T unkeyed queries across users. A typical case is to consider L = TV 1 / 3 users, 
to perform T = TV 2 / 3+e unkeyed queries (7V 1 / 3+e chains of TV 1 / 3 queries, 
memory TV 1 / 3 ). For each new user, we need AT 1 / 3+e key-dependent queries. As 
a consequence, the amortized cost per user (up to constant factors Co = 20) 
is AT 1 / 3+e queries of each type and the required memory also is TV 1 / 3 . 

2. If the goal of the cryptanalyst is to obtain at least one user key among all 
the users, it suffices to split the D key-dependent queries arbitrarily across 
the users. 

However, we present in this section a much more efficient tradeoff in the multi- 
user setting. This tradeoff becomes possible without precomputation in TV 2 / 3 , but 
by distributing the unkeyed queries among the users and by reusing the graph 
algorithmic idea of the section O For this, we construct a graph whose vertices 
are labelled by the users. Whenever we obtain a collision Fjj ^\x) = F n ( j \y) 
for users i and j, we add an edge between the corresponding vertices labelled 
with x ® y which is expected equal to FfW 0 Note that this indicates that 
we know the exclusive-or of the first keys of the two users. 

If we have L vertices and cL /2 randomly edges with c = 4, there is a giant 
component whose size is 98% of the points, and with cLlnL, all the points are 
in this component with overwhelming probability. Consequently, we obtain the 
exclusive-or of the first keys for an arbitrary pair of users. To conclude the attack, 
it suffices to find a single collision between any of the users functions Fn of the 
large connected component and the unkeyed function F n to reveal all the keys 
of these users. 

Algorithm Description. 

1 . Create a constant number c /2 of chains for each user up to a distinguished 
point. 

2 . Sort the distinguished points. 

3. Bring together the distinguished points into subsets, where we test whether 
the key candidate is really the good one. It is indeed easy to check with a 
few more queries if the xor of two keys is correct. 

4. Construct the giant component and expect that the public user (the user 
with the unkeyed function), lies in this giant component. To this end, we 
initially begin with the set of reachable users containing only the public 
user. Then, we add to this set all the users that are in a group where a 
reachable user is present. At some point, the reachable set is stable and we 
stop. 

Or key-pair depending on whether we are considering the single or dual key scheme. 
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5. From the public user, we cross over the giant component and determine the 
keys of each user. 

The first step requires cLt/ 2 data and time 0(c£) on average per user where 
i is the average length of the chains. Then, the remaining steps are performed 
in time linear in the number of users L. Typical parameters are: for an ar- 
bitrary small positive constant c, we expect with TV 1 / 3 users, c ■ N 1 / 3 queries 
per user and TV 1 / 3 unkeyed queries, to recover almost all the TV 1 / 3 keys with 
overwhelming probability. If we want to recover all users, we need to have 
LlnL = cN 1 / 3 In N = A/' 1 / 3+e edges (instead of cL/2) to connect all compo- 
nents according to the coupon collector’s problem. 

Analysis of the attack. We want to use results from graph theory to prove the 
correctness of our algorithm, this means that we have to prove that the assump- 
tions of the giant component theorem are satisfied. We have to show that we 
construct of a random graph according to the Erdos-Renyi model of random 
graphs, in which each possible edge connecting pairs of a given set of L vertices 
is present, independently of the other edges, with probability p. In this case, 
we know that with this model of random graph, if the number of edges c.Lj 2 is 
larger than the number of vertices L, there is with high probability a single giant 
component, with all other components having size O(logL) according to [6]. 

Consequently, we need to prove that we construct a random graph and that 
the edges are added independently of each others. We will define an idealized 
version of the attack and we will show that the attack works in this version. 
Then, we will prove that the idealized version and the attack are equivalent 
using simulation argument. 

In the idealized model, the simulator randomly chooses L keys K i, . . . ,Kl 
uniformly at random. Then it iterates the functions Fjj\x) = Ki ® F 7r (x ® Kf) 
until X£ ® Ki G So , where So is the set of pairs containing a distinguished point 
di and an identificator of this point id(di). The identificators are unique, which 
means that we do not have collision on them. Finally, the simulator reveals the 
identificator of the point X£ ® Ki and the point X£. The value Ki cannot be 
recovered from the information that the simulator returns. 

To show that the attack works in this ideal model, we just have to see that 
if two users have the same identificator, then X£ ® Ki = x# ® K } and therefore 
X£ ® X£> = Ki ® Kj which is the same information as in the real attack. 

u\ 

Now, we will prove that the simulator does not need to know Ffj and can 
simulate the information by only using the public random function F n and that 
the distribution of its outputs is indistinguishable from the idealized model. The 
simulator generates at random L random keys for the EM scheme. For each key, 
we will show that the pairs distinguished point /identificator can be generated 
only using F n . Indeed, X£ the £th iteration of Fjfi with key Ki from the value 
xq is the value Ki ® X£ and this value is also the result of the iteration of the 
public function F n from the value xo ® Ki. Consequently, to generate the pairs 
(distinguished point, identificator), the simulator can compute {x£ ® Ki,id(x£)) 
without interacting with the users. As in this last case, the pairs are generated 
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at random without interacting and knowing the function and since the function 
F n are random, the edges in the graph are added at random and independently 
of each others and so that the graph is a random graph according to the Erdos- 
Renyi graph model. 

Experimental results. We implement the previous attacks on an Even-Mansour 
cryptosystem using the DES with a fixed key and n = 64. We simulate 2 22 users 
and for each user we create 8 chains (80 for the public user). We use distinguished 
points containing 21 zeroes and so the expected length is 2 21 on average. We 
bound the length of the chains to 2 24 , this means that if we remove the chain 
if we have not seen a distinguished point after 2 24 evaluations. In all, we have 
generated 33, 543, 077 chains (2 25 = 33, 554, 432, it misses the abandoned chains) 
and the number of groups containing at least two parallel chains is 4, 109,961. 
Experimentally, the size of the giant component contains 3, 788, 059 users (among 
the 4, 194, 304) and so we can deduce the keys of 90% of the users. This result 
is what is expected from theory since the number of vertices in this experiment 
is below the number of nodes. The 98% that is previously given as result in 
section [3751 would require twice as many vertices. 

The time to generate the chains is 1600 sec using 4096 cores in parallel and 
the analysis of the graph requires a few minutes on a standard PC. 

4 Attacks on the PRINCE Cipher in the Multi-user and 
Classical Setting 

PRINCE is a lightweight block cipher published at ASIACRYPT 2012 [7 . It 
is based on the FX construction [16] which is actually an Even-Mansour like 
construction. PRINCE has been the interest of many cryptanalysts [9123115] who 
attack either the full cipher, or its reduced version. 

The designers of PRINCE claim that its security is ensured up to 2 127_n 
operations when an adversary acquires 2 n plaintext /ciphertext pairs. This bound 
has been reduced in [15] to 2 126 operations with a single plaintext /ciphertext 
pair. After a brief presentation of PRINCE, we describe a generic attack in the 
multi-user setting that allow to recover the key of a pair of users in a set of 
2 32 users with complexity 2 64 computations. The identification of the pair of 
users uses the idea similar to the attack on Even-Mansour. However, details 
are different since PRINCE is not an Even-Mansour scheme as the internal 
permutation uses a secret key. Finally, we present another generic attack in the 
classical model that after a precomputation of 2 96 time and 2 64 in memory, 
allows to recover the key of every single user in time 2 32 . Both attacks work for 
all rounds of PRINCE. 


4.1 Brief Description of PRINCE 

PRINCE [7] uses a 64-bit block and a 128-bit key which is split into two equal 
parts of 64 bits, i.e. k = ko\\ki. In order to extend the key to 192 bits it uses 
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the mapping k = (fco || fci) (fco||fcoll&i) where k' 0 is derived from fco by using a 
linear function L 

L'(k 0 ) = (fco 1) ® (fc 0 > 63), 

where denotes the right shift and the rotation of a 64-bit word. While 
subkeys fco and fc 0 are used as input and output whitening keys, the 64-bit key fci 
is used for the 12-round internal block cipher which is called PRINCE core . For 
simplicity, we refer to it as the core of PRINCE or simply the core function and 
we denote it by Pcore. So every plaintext P is transformed into the corresponding 
ciphertext C by using the function Ek(P) = fc 0 ® Pcore ^ (P ® fco) where Pcore 
uses the key fci (see FigH]). 


m 


i 


c 


Fig. 1 . Structure of PRINCE 

The core function consists of a key fci addition, a round constant (RCo) ad- 
dition, five forward rounds, a middle round, five backward rounds and finally a 
round constant (RCu) and a key fci addition. The full schedule of the core is 
shown in Fig. O 
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Fig. 2. Structure of the core of PRINCE 

Each forward round of the core is composed by a 4-bit Sbox layer (S'), a linear 
layer (64 x 64 matrix M), an addition of a round constant RCi for i E {1, . . . , 5} 
and the addition of the key fci. The linear M layer is defined as M = SR o M 
where SR is the following permutation 
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The M layer, which is only used in the middle rounds, can be seen as a mirror 
in the middle of the core as the 5 backward rounds are defined as the inverse of 
the 5 forward rounds. 

In every RCi - add step, a 64-bit round constant is XORed with the state. 
It should be noted that RCi ® RCn-i = a = 0xc0ac29b7c97c50dd for all 
0 < i < 11. From this, but also from the fact that the matrix M is an involu- 
tion, we can perform the decryption function of PRINCE by simply performing 
the encryption procedure with inverse order of keys fco and fc 0 and by using the 
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key k\ 0 a instead of k\. That means, that for any key (ko\\k 0 \\ki), we have 
- D (fe 0 ||fe^||fe 1 )(-) = Ag|fco||fci©a)(-)- This property is called the a-reflection prop- 
erty of PRINCE. 


4.2 Attack on PRINCE in the Multi-user Setting 

In the multi-user setting, we assume that we have L different users which are all 
using the block cipher PRINCE. Each user Ui with 0 < i < L, chooses her key 
feW' = *£>||JfcW at random and independently from all the other users. In order 
to attack PRINCE using the distinguished point method, we first construct a 
set of chains for every user using the function of PRINCE. For this, we use the 
function defined as follows: 




k '(i) k (i) 


(x) = x © PRINCE. 


(i) 


k 'm , ,(o 


(x)© PRINCE. 


(0 u'W 

0 5^0 ’^1 


(x 0 (5) 


' (i) 

where S is an arbitrary but fixed non zero constant. The key k 0 { ; vanishes from 
the equation and the function F thus takes the following form: 

F k (i) (x)=x® Pcore k (i) (x 0 k^) 0 Poorest) ( x 0 k^ 0 S). 

For every user Ui, we create one encryption {£) chain and one decryption (V) 
chain which are both based on the function F defined above. £ uses the encryp- 
tion function of PRINCE whereas V uses the decryption function. And so, for 
the user Ui, we define functions £ and V as follows: 

£ fc (0 )jb (i) OT) = •4b = 0 Pcore k (i) (x^ © kf^) 0 Pcore fe (i> (xf 1 0 0 6) 


V k«\k^® a (y ( U = Vil i 

= yf 0 Pcore k ^ @a {yf 0 k 0 (l) ) 
0 Pcore k (i ) 0 fc 0 W © <*)• 


Let us define: 

f £ = Pcore k (i ) (x^ 0 k^) 0 Pcore k ^ (x^ 0 k^ 0 S) and 

f v = Pccre^^yf 0 fc Q W ) © P core fc W©a(»5’ ) 0 ^ © *)■ 

We create encryption chains until f £ reaches a distinguished point (resp. de- 
cryption chains until f v reaches a distinguished point). We search for a collision 
between the encryption and the decryption chain. 

Let us consider two users, U\ and U^ Whenever the chains £ (i) (i) ( x and 

K Q ,K 1 

X>,( 2 ) ,( 2 ) m ( y^) arrive at the same distinguished point, we suspect that these 

AC 0 5 H/ VL/ 
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two chains have become parallel. As the core of PRINCE is only parametrized by 
the key k \ , when we arrive at the same distinguished point we obtain a probable 
collision between keys k[^ and k^ 0 a used in Pcore. However, we must verify 
that this is a real collision and not just a random incident. For this, we verify 
that next points of f s and f v after reaching a distinguished point, continue to 
remain equal. If we obtained a real collision we know that: 

k{ 1] = k{ 2) ® a. 

This indicates that 0?/ 2 ) is expected equal to k ^ 0 k$ 2 \ It is obvious that 
since k[^ = k[ 2 ^ 0 a we will also have k[^ 0a = k [ 2 ^ . This indicates that we 
also know k 0 k^ 2 \ 

Thus, we have: 

fcW © k' 0 {2) = A and k' 0 (1) © fc£ 2) = B (*)• 

Let {a 63 , . . . , ao} be the representation of the bits of kg and {//(,:)■ ■ • ■ , VI the 
representation of bits of kg. As, from the definition of PRINCE, k 0 = (k o 
1) 0 (ko 63), we have that: 

fc 0 (1) = {a 0 , <363, •••,< 32,01 ©«63} and k 0 (2) = {b 0 , b 63 , . . . , b 2 , h © 6 6 3 }- 
From (*), we construct the system: 

{<263, • • • , <2o} 0 {fro, fr63, • • • , & 2 , frl ® fr63} = {^63, • • • , A)} 

{fr63, • • • , fro} ® {^ 0 , 3, • • • , a 2 , ai 0 a63} = {E>63, • • • , B 0 } 

As this is an inversible linear system, we can easily find k^ and k^\ Note 
that once k^ has been found, recovering k[^ can be done with an exhaustive 
search whose cost is 2 64 . 

Analysis of the attack. Once the computation of a chain is finished we have to 
store (x£-i , d, d+ 1) where d is the distinguished point, xt>- \ is the point before 
the chain reaches a distinguished point and d + 1 is the point after the chain 
reached a distinguished point. We need to store X£-i as we have to test if the 
found collision is useful and we also need to store d + 1 to test if it is a real 
collision. If not, the search must continue. 

As mentioned, PRINCE uses a 128-bit key which is split into two 64-bit 
parts, i.e. k = ko\\ki. The attack consists in identifying and recovering all key 
material of a pair of users i and j for whom k^ = k[^ 0 a. We expect to find a 
collision k[^ = k[^ 0 a between two different users with high probability when 
the number of users will be at least 2 32 . So the attack uses a set of 2 32 users 
and for each one we create 2 chains (encryption and decryption chain) . The cost 
per user is 2 32 operations and the total cost for recovering the keys ko of 2 users 
is approximately 2 64 operations. For recovering Aq, the cost of the exhaustive 
search is 2 64 . So in total, we can deduce both ko and k\ in 2 65 operations. 
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4.3 Attack in the Classical Model 

We show in this section that a classical attack that also uses the distinguished 
points technique can also be possible. For this, we will create encryption chains 
from the function £ defined in section 1421 

Precalculation. In the first phase of the attack, we aim to create encryption 
chains for every possible key with 0 < i < 2 64 . More specifically, for every 
possible k[ l \ we set k^ = 0 and we create for every (i) a chain Si from the 
function £ with length 2 32 . We store all chains Si. 

Attack. Now, our purpose is to find a collision with one of the chains created 
with the zero key k^. For this, for a random starting point xq and for keys fco 
and k\ we will calculate an encryption chain T from the function £. The chain 
T will collide with high probability with one of the chains Si. As described in 
previous section l42l when we detect a collision between two distinguished points, 
we know that the chains had become parallel and so we obtain k^ ® fco • As the 
key = 0, we finally obtain the unknown fco. 

Analysis of the attack. For the precalculation phase, for every 2 64 possible keys 
we calculate a chain with length 2 32 and so our complexity is equal to 2 96 . As 
we need to store all chains, the precalculation phase has also a cost of 2 64 in 
memory. However, once the first phase is over, the attacker can perform the 
attack in only 2 32 operations as she has to calculate only one chain. So, the total 
cost of the attack is 2 96 . The proposed attack satisfies DT = 2 128 as D = 2 32 
and T = 2 96 . This attack does not improve the complexity of PRINCE given 
in |7] and Q3- However, in our case, T is not the on-line time complexity as it 
corresponds to the precalculation phase of the attack. Thus, in our attack, we 
have DT on = 2 64 . 

5 Conclusion 

In this paper, we have presented new tradeoffs for public-key and symmetric-key 
cryptosystems in the multi-user setting. We have introduced some algorithmic 
tools for collision-based attacks using the distinguished point technique. The 
first tool allows to look for the discrete logarithm of L users in parallel using 
only a 0(y/L) penalty using random graph process behaviour. The second tool 
allows to achieve key-recovery of Even-Mansour and related ciphers and is a novel 
lambda technique to find collisions when two different functions are involved. For 
the Even-Mansour cipher, we show new tradeoffs that partially solve an open 
problem due to Dunkelman et al. and we propose an analysis in the multi-user 
setting. Finally, for the PRINCE cipher, we show generic attacks that improve 
the best published results in the sense that our time complexity corresponds to 
a precomputation phase and not to an on-line phase. This last result could also 
be adapted to similar ciphers such as DESX and would also improve on the best 
previous attack. 
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Abstract. The iterated Even-Mansour (EM) scheme is a generalization 
of the original 1-round construction proposed in 1991, and can use one 
key, two keys, or completely independent keys. In this paper, we me- 
thodically analyze the security of all the possible iterated Even-Mansour 
schemes with two n-bit keys and up to four rounds, and show that none 
of them provides more than n-bit security. Our attacks are based on a 
new cryptanalytic technique called multibridge which splits the cipher 
to different parts in a novel way, such that they can be analyzed inde- 
pendently, exploiting its self-similarity properties. After the analysis of 
the parts, the key suggestions are efficiently joined using a meet-in-the- 
middle procedure. 

As a demonstration of the multibridge technique, we devise a new at- 
tack on 4 steps of the LED- 128 block cipher, reducing the time complex- 
ity of the best known attack on this scheme from 2 96 to 2 64 . Furthermore, 
we show that our technique can be used as a generic key-recovery tool, 
when combined with some statistical distinguishers (like those recently 
constructed in reflection cryptanalysis of GOST and PRINCE). 

Keywords: Cryptanalysis, meet-in-the-middle attacks, multibridge at- 
tack, iterated Even-Mansour, LED- 128. 


1 Introduction 

Most block ciphers (such as the AES) have an iterated structure which alter- 
nately XOR’s a secret key and applies some publicly known permutation (typ- 
ically consisting of S-boxes and linear transformations) to the internal state. A 
generic way to describe such a scheme is to assume that the permutations are ran- 
domly chosen, with no weaknesses which can be exploited by the cryptanalyst. 
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This approach has several advantages: First of all, this is a very clean construc- 
tion with great theoretical appeal. In addition, we can use the randomness of 
the permutation in order to prove lower bounds on the complexity of all possible 
attacks, something we cannot hope to achieve when we instantiate the scheme 
with a particular choice of the permutation. Finally, any new generic attack on 
block ciphers with this general form can have broad practical applicability. 

At Asiacrypt 1991 nn> Even and Mansour defined and analyzed the simplest 
example of such a block cipher, which consists of a single public permutation and 
two independently chosen secret keys XOR’ed before and after the permutation. 
We call such a scheme a 1-round 2-key Even-Mansour (EM) scheme. In their 
paper, Even and Mansour showed that in any attack on this scheme that succeeds 
with high probability, TD > 2 n . This implies that any attack on the scheme has 
overall complexity (i.e., the maximal complexity among the time0 memory and 
data complexities) of at least 2 n / 1 2 3 . In such a case, we say that the security of the 
scheme is 2 n / 2 , or n/2 bits0 At Eurocrypt 2012 [10], a matching upper bound 
in the known plaintext attack model was proved, and thus the security of this 
scheme is now fully understood. 

Since the security provided by a 1-round 2-key EM is much smaller than 
the 2 2n time complexity of exhaustive key search, multiple papers published in 
the last couple of years had studied the security of iterated EM schemes with 
more than one round (e.g., 121419118121 123 ] ). These schemes differ not only in 
their number of rounds, but also in the number of keys they use and in the order 
in which these keys are used in the various rounds. This is somewhat analogous 
to the study of the security of generic Feistel structures with various numbers 
of rounds, which led to several fundamental results in theoretical cryptography 
in the last two decades (e.g., how to construct pseudo-random permutations 
from pseudo-random functions, and how many queries are required in order to 
distinguish them from truly random permutations |19l24] h 

In this paper, we study the security of iterated EM constructions using two 
independent keys. As the security of the 1-round variant is already determined 
to be 2 n / 2 , and as it is easy to see that a 2-round variant supplies security of at 
most 2 n , we analyze all 3-round and 4-round variants with two keys. We show 
that for any possible ordering of the two keys, all the r-round variants with 
r < 4 provide security of at most 2 n (compared to exhaustive key search which 
requires 2 2n time). Furthermore, for all such variant^! we obtain a complete 
tradeoff curve of DT = 2 2n in the known plaintext attack model. 


1 We define security in the computational model, which calculates the time complexity 
according to the number of operations that the attacker performs. This model is 
different from the information theoretical model (used, for example, in [4]), which 
only considers the number of queries to the internal permutations of the primitive. 

2 Note that, unlike the tradeoff attacks described in Heilman’s paper nu, the overall 
complexity of an attack takes into account all attack stages. In particular, we do 
not allow any free preprocessing stage. 

3 Not including some weak variants, for which an attack of time complexity 2 n can 
be obtained given only 2 plaintext-ciphertext pairs (i.e., the unicity bound). 
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Since several concrete proposals for block ciphers use a relatively small num- 
ber of fairly complex rounds, our theoretical analysis has immediate practi- 
cal applications. For example, we can use our results in order to compare the 
best achievable security of schemes with various numbers of rounds and key 
schedules, and thus to guide the design of future schemes. More surprisingly, 
we can use our new generic attacks in order to improve by a large margin the 
running time of the best known attacks on the extensively studied lightweight 
block cipher LED- 128, without even looking at its internal structure. 

LED- 128 |13j is a typical example of an iterated EM scheme. It is a 64-bit 
block cipher that uses two unrelated 64-bit keys, which are alternately XOR’ed 
in consecutive rounds. Since its publication at CHES 2011, reduced variants 
of LED- 128 have been extensively analyzed, and in particular the 4-step0 vari- 
ant (reduced from the full 12) was analyzed in 3 consecutive papers at ACISP 
2012 [16], Asiacrypt 2012 [21] and FSE 2013 [23], using a variety of cryptanalytic 
techniques (see Table [Tj) . 


Table 1 . Attacks on 4- Step LED- 128 


Reference 

Generic 1 

Data 11 

Time 

Memory 

Security 

m 

No 

2 16 CP 

2 112 

2 16 

2 112 

m 

Yes 

2 64 KP 

2 96 

2 64 

2 96 | 

m 

Yes 

D < 2 32 KP 

2 128 /£> 

D 

2 96 | 

This paper 

Yes 

D < 2 64 KP 

2 128 /£> 

D 

2 64 | 


I “Generic” stands for an attack independent of the actual 
step function. 

II The data complexity is given in chosen plaintexts (CP), 
or in known plaintexts (KP). 


The first attack on 4-step LED- 128 is described in [T6] . The attack combines 
the splice-and-cut technique [3] with a meet-in-the-middle attack which is based 
on specific properties of the LED permutation. It has a time complexity of T = 
2 112 , and requires D = 2 16 chosen plaintext-ciphertext pairs. The second analysis 
of 4- step LED- 128 is given in [21] and is applicable to all 4-round EM schemes 
with 2 alternating keys. When applied to 4- step LED- 128, it has a reduced time 
complexity of T = 2 96 (compared to T = 2 112 of the attack of [16]), but it 
requires the full code-book of D = 2 64 plaintext-ciphertext pairs. The attack 
uses a technique related to Merkle and Heilman’s attack on two-key triple-DES 
(2K3DES) [22], in combination with Daemen’s chosen plaintext attack of EM [6]. 
Finally, the currently best known attack on 4- step LED- 128 is a known plaintext 
attack given in |23 . The attack uses an extension of the SlideX attack m in 
order to obtain a flexible tradeoff curve of TD = 2 128 for any D < TV 1 / 2 . 

4 In the design of LED, the term “step” is used in order to describe what we refer to 
as a “round” of an iterated EM scheme. 
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By using our new generic attack on 4-round EM with alternating keys, we 
can extend the tradeoff curve all the way to D = N. We can thus reduce the 
time complexity of the best known attack on 4- step LED- 128 by a large factor 
of 2 32 , from the totally impractical T = 2 96 to a more practical T = 2 64 . We 
note that when considering much smaller improvements over exhaustive search, 
attacks on up to 8 rounds of LED- 128 have been published in [9]. 

In order to obtain our improved generic attacks, we had to develop a new 
cryptanalytic technique. The new technique stems from the dissection tech- 
nique [7 and from the splice-and-cut technique [3j, but has also additional 
features. Like the dissection technique, it divides the cipher into several parts 
treated independently by enumerating over an intermediate value, but unlike 
dissection, the parts are not consecutive but rather nested. In addition, as the 
splice-and-cut technique, the new attack takes advantage of “splicing” (or con- 
necting) two ends of the cipher together. However, in the original splice-and-cut 
technique, the plaintexts and ciphertexts were “spliced” together, and as a result 
it was essentially a chosen plaintext attack. On the other hand, in our attack we 
bridge (or connect) together intermediate encryption values, and thus our attack 
does not have this constraint and can use known plaintexts. Once we connect 
a pair of intermediate encryption values using a bridge, we use a self-similarity 
property of the cipher in order to connect another pair of intermediate encryp- 
tion values using another bridge. Thus, as our attack bridges between multiple 
parts of the cipher using multiple bridges, we call it the multibridge attack. 

In addition to their application to iterated Even-Mansour ciphers with two 
keys, we notice that our techniques can also be combined with statistical distin- 
guishers to give efficient key recovery attacks on certain block ciphers. These 
block ciphers have internal symmetric properties which allow us to connect 
(bridge) together intermediate encryption values at a relatively low cost. Such 
bridges are constructed in reflection cryptanalysis, a technique introduced by 
Kara in m, and generalized more recently by Soleimany et al. in [28]. Thus, as 
an additional application of our multibridge attack, we show how to use it as a 
generic key-recovery tool in reflection cryptanalysis. 

The self-similarity properties of the cipher that we exploit in multibridge 
attacks are similar to the ones exploited in the SlideX attack m on 1-round 
EM with one key and in later publications (9123] . However, in the multibridge 
attack the connected parts are more complex, analyzed themselves using bridging 
techniques, and are joint using several meet-in-the-middle attacks. 

The paper is organized as follows: in Section [2] we describe the notations and 
conventions used in this paper. In Section [3] we describe our new multibridge 
attack on the alternating key scheme, and its application to LED- 128 and to 
reflection cryptanalysis. In Section[4] we classify all 4-round iterated EM schemes 
with two keys and summarize our attacks on them. We finish the analysis of 4- 
round iterated EM schemes in Section [5] and finally propose open problems and 
conclude the paper in Section [6] 
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2 Notations and Conventions 

Notations. For a general r-round iterated EM scheme with a block size of n bits, 
we denote by Fi the public function of round i. We denote by the round- key 

added at the beginning of round i (i.e., Kq is added before round 1 ), while the last 
round- key is denoted by K r (see Fig. [I]). Given a plaintext-ciphertext pair (P, (7), 
we denote the state after i encryption rounds by Xi (e.g., Xq = P, X\ is the state 
after one encryption round, etc.). In order to simplify our notation, we define 
Xi = Xi ® Pi, and so Fi+i(Xi) = W+i- In some of our attacks, we consider 
several parallel evaluations which are similarly denoted by Yj+i = Fj+ 

Zj+i CiTi ) 5 

Conventions. In this paper, we evaluate our attack algorithms in terms of 
the time complexity T, the data complexity D, and the memory complexity 
M, as a function of N = 2 n where n is the block size. Note that this N is 
not necessarily the size of the key space, and exhaustive search of a 2-key EM 
scheme requires N 2 rather than N time. The complexities of our algorithms are 
generally exponential in n, and thus we can neglect multiplicative polynomial 
factors in n in our analysis. 

We note that in all of our memory-consuming attacks, it is possible to use time- 
memory tradeoffs in order to reduce the amount of memory we use. However, 
in this paper we are mainly interested in tradeoffs between the data and time 
complexities of our attacks, and thus we simply assume that we have sufficient 
memory to execute the fastest possible version of the attack, i.e., given D known 
plaintext-ciphertext pairs, we always try to minimize T. 



Fig. 1 . Iterated Even-Mansour 


3 A New Attack on 4-Round Iterated Even-Mansour 
with Two Alternating Keys 

The currently best known attack on 4-round iterated EM scheme with 2 alter- 
nating keys (see Fig. 12 ) was proposed in [23] as part of the analysis of 4-step 
LED- 128 (improving the previous attacks of [16121] ). The attack yields a tradeoff 
curve of TD = TV 2 , but is limited by an expensive outer loop that guesses one 
of the keys and performs computations on the entire data for each such guess. 
Therefore, the tradeoff TD = N 2 is restricted by the constraint T > ND (or 
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TD > ND 2 ) and is valid only up to D = TV 1 / 2 . Consequently, the attack cannot 
efficiently exploit more than D = TV 1 / 2 known plaintexts even when they are 
available. In this section, we describe a new attack, which can obtain the curve 
TD = TV 2 for any amount of given data D < N. In order to provide sufficient 
background to our new attack, we start by describing the very simple variant 
of the SlideX attack (proposed in [10]) on 1-round EM with one key, and then 
describe the previous attack of [23] on 4-round iterated Even-Mansour with 2 
alternating keys. After this background material, we describe the basic variant 
of our new attack on this scheme that applies in the case D = TV , and then gen- 
eralize the basic attack in order to obtain the complete curve TD = TV 2 . Finally, 
we apply the multibridge attack to 4- step LED- 128, improving the running time 
of the best known attack on this well-studied scheme from 2 96 to 2 64 . 



Fig. 2. 4- Round Iterated Even-Mansour with Alternating Keys 


3.1 The SlideX Attack on 1-Round Even-Mansour with a Single 
Key 

The SlideX attack m is an optimal known plaintext attack on 1-Round EM 
with one key. It is based on the observation that for each plaintext-ciphertext 
pair (P, C) = (Xo,Xl), by definition P 0 K = Xq and C ® K = Xi, hence 
P®d = Xo ® X\ (see Fig. [3]). As described in the attack below, this equality 
is exploited in order to match the plaintext-ciphertext pairs with independent 
evaluations of the public function F\ by the attacker. Each such match yields a 
suggestion for the key, which we can easily test. 

1. For each of the D plaintext-ciphertext pairs (P\ C l ): 

(a) Calculate P l ® C 2 , and store it in a sorted list L, next to P l . 

2. For N/D arbitrary values Yq : 

(a) Compute Y( = F\ ( Yq ) and search Yq ® Y( in the list L. 

(b) For each match, obtain P l and compute the suggestion K = P l ® Yq . 

(c) Test the suggestion for K using a trial encryption, and if it succeeds, 
return it as the key. 

As we have D plaintext-ciphertext pairs (P\ C l ) and we evaluate N/D arbi- 
trary values Yq , we have D • N/D = N pairs of the form (i, j). Thus, according 
to the birthday paradox, with high probability, there is a pair (i, j) such that 
Yq = P l ® K = Xq. This implies that Y^ ® Y( — P l ® C\ and thus we get 
a match in Step 2. (a), suggesting the correct key K. The time complexity of 
Step 1 is D. The time complexity of Step 2 is N/D , since for an arbitrary value 
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of Yq ® Y/, we expect a match in Step 2. (a) with probability D/N (and thus, 
on average, we perform only a constant number of operations for each value of 
Yq). Consequently, the time complexity of the attack is max(D, N/D), i.e., the 
attack gives a tradeoff curve of TD = A, but only for D < TV 1 / 2 (i.e., it cannot 
efficiently exploit more than D = TV 1 / 2 known plaintexts). 
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Fig. 3. The Slidex Attack on 1-Round Even-Mansour with 1 Key 


3.2 The Best Previous Attack on 4-Round Iterated Even-Mansour 
with Two Alternating Keys [23J 

The best previous attack [23] starts by guessing Ao. This guess makes it pos- 
sible to eliminate the first and last XOR’ed keys and thus also the first and 
last permutations by partially encrypting (and decrypting) the plaintext (and 
ciphertext). In addition, guessing Ao enables the attacker to combine the sec- 
ond and third applications of the permutations F 3 (F 2 {x) ® Ao) into a single 
known permutation, F 'k „ (x). This reduces the 4-round EM scheme into a single 
round EM scheme with a single key, which can be easily attacked by the SlideX 
technique (see Fig. The details of this attack are described below. 

1. For all values of Kq\ 

(a) For each of the D plaintext-ciphertext pairs (P\ C l ): 

i. Compute X\ and X\, and store X{ ® X% in a sorted list L, next to 

X{. 

(b) For N/ D arbitrary values Y [ : 

i. Compute Y 3 and search Y( ® Y£ in the list L. 

ii. For each match, obtain X\ and compute the suggestion K\ = X\ ® 

V{- 

iii. Test the suggestion for the full key (Ao, K 1 ) using a trial encryption, 
and if it succeeds, return it. 

For the correct value of Ao, according to the birthday paradox, with high 
probability there is a pair (i, j) such that Y( = X\. This implies that X\ ® X\ = 
Y{ ® Y 3 j , and thus we get a match in Step l.(b).i, suggesting the correct key 
(Ao, Ai). The time complexity of Step l.(a) is A, and the complexity of Step 
l.(b) is N/D (we do not expect more than one match in L in Step l.(b).i for an 
arbitrary value of Y( ® Y%). Thus, for each value of Ao that we guess in Step 1, 
we perform max(D , N / D) operations. Consequently, the attack gives a tradeoff 
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curve of TD = AT 2 , but only for D < TV 1 / 2 , i.e., the time complexity must satisfy 
T > AT 3 / 2 . In particular, for N = 2 64 , the best possible time complexity of this 
attack (for any available amount of data) is at least 2 96 . 
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Fig. 4. The Best Previous Attack on 4-Round Iterated Even-Mansour with Two Al- 
ternating Keys 


Applying a Generalized Version of the Attack to any 2-Key 4-Round 
Iterated Even-Mansour Scheme. Before describing our improved attack, 
we notice that in a general 4-round iterated EM scheme with 2 keys which can 
be used in any order, there is always a key that is added at most twic^E Thus, 
the attack of [23] can be easily generalized and applied with the same complexity 
to any 4-round iterated EM scheme with 2 keys. The generalized attack works by 
guessing the value of the most common key (i.e., the key that is added at least 3 
times), partially encrypting (decrypting) the plaintexts (ciphertexts), and thus 
obtaining the inputs/outputs of a single- key EM scheme with a single permuta- 
tion (which is fully known after guessing the most common key). However, as 
we show in the rest of this paper, when D > AT 1 / 2 , more efficient attacks exist 
on all 4-round 2-key EM schemes. 


3.3 The Basic Version of Our New Multibridge Attack on 4-Round 
Iterated Even-Mansour with Two Alternating Keys 

The approach of the previous attack was to guess iVo, and thus “peel off” the first 
and last rounds on the 4-round EM scheme with 2 alternating keys. Although 
this approach seems natural, it gives the tradeoff curve of TD = AT 2 only for 
D < AT 1 / 2 , and thus its time complexity is at least T > AT 3 / 2 . We now present 
our new attack on this scheme which achieves the same tradeoff for any D < AT, 
and thus enables us to reduce the time complexity to T = AT. 

Unlike the previous attack, which guessed the value of Kq, our attack guesses 
the value of some internal state for which a special self-similarity property holds. 
This property allows us to split the cipher into two parts which can be analyzed 
independently. While standard meet-in-the-middle attacks also split the cipher 
into two parts, in our attack the two parts of the cipher are nested (rather than 
concatenated), similarly to attacks based on the splice-and-cut technique [3]. 

5 Schemes in which there is a key that is added only once are very weak (as shown in 
the full version of this paper [8]). 
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However, it is interesting to note that while splice- and-cut attacks consider the 
first and the last rounds of the cipher as consecutive rounds (i.e., the cipher 
is spliced using the plaintext-ciphertext pairs), here we connect (or bridge) the 
cipher internally and consider as consecutive rounds its two internal ends. 

We begin by describing our multibridge attack for the specific case of D = N 
(i.e., given the full code-book), for which the attack runs in time T = AT. In this 
case, we look for some plaintext-ciphertext pair (P 2 , C l ) with the internal fixed- 
point property X\ = X\ (i.e., we connect X\ and X\ using a “bridge”). Since 
XOR’ing the same key twice leaves the result unchanged, this self-similarity 
property also implies that X\ = X\ (i.e., X\ and X\ are now connected using 
another bridge, which we get “for free”), and this allows us to split the cipher 
into 2 nested partfl each independently suggesting a value for the key K$. 
Finally, the suggestions are merged using a meet-in-t he- middle technique. Note 
that for a specific plaintext-ciphertext pair, this internal fixed-point property 
occurs with probability 1/AT, and thus given D = N data, with high probability, 
one of the plaintext-ciphertext pairs satisfies this property. The details of the 
basic multibridge attack are given below (see Fig. 0: 

For each of the D = N known plaintext-ciphertext pairs (P 2 , C l ): 

(a) Calculate P l 0 C 2 , and store it in a sorted list Li, next to P 2 . 

For each of the N possible values of Y ( : 

(a) Compute Y> = Ff 1 (F 1 i ). 

(b) Assume that Y£ = Y ( , and compute Y^ = F±(Y£). 

(c) Compute Yq 0 Y£ and search for matches with this value in L\. 

(d) For each match, obtain P 2 , calculate a suggestion for Kq = P 2 0 Yq . 
Store all the suggestions in a sorted list L 2 , next to Y( . We expect P 2 
to contain about N entries. 

For each of the N possible values of Zf (i.e., the intermediate encryption 
value obtained after applying 1 round and adding K\): 

(a) Compute Z| = P 2 (Zf). 

(b) Assume that Z| = Zf, and compute Z| = P 3 _1 (Z3). 

(c) Compute Kq = Z| 0 Z| and search for matches in P 2 - We expect one 
match on average for a given value of Kq. 

(d) For each match, obtain Y ( , calculate a suggestion for K\ = Y( 0 Zf. 

(e) Test the suggestion for the full key {Kq,K\) using a trial encryption, 
and if it succeeds, return it. 

The success of the attack is based on the observation above, namely, given 
D = N plaintext-ciphertext pairs (P 2 ,C 2 ), then with high probability, there 
exists an i such that X\ = X\. Since we iterate over all possible values of Y 7 / in 
Step 2 of the attack, then for Y( = X\, we calculate Y" 0 J 0Y4 = Xq 0X| = P 2 0C 2 
in step 2.(c). Thus, we get a match with the correct value of Ko is Step 2.(d), 
and we store it next to Y( = X\ in the list P 2 . Similarly, since we iterate over all 


1 . 

2 . 


In fact, as described in the detailed attack, the first part of the cipher is in itself 
also composed of 2 parts. 
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possible values of Zf, then for Zf = we have Z| = Zf = X{ = X%. Hence, 
we calculate the correct value of Ko in Step 3.(c), obtain the match with L 2 such 
that Y( = X[, and obtain the correct K\ = Y{ ® Zf = X\ ® X\. As a result, we 
encounter the correct suggestion for the full key in Step 3.(e) and return it. 

The attack is composed of a sequential execution of 3 mains steps, each has a 
time complexity of N : in Step 1, we perform a simple XOR operation for each 
of the D = N plaintext-ciphertext pairs, and allocate the list Li, which is of 
size N. In Step 2, we iterate over N possible values of F/, and for each such 
value we expect a single match in L\ in Step 2.(c), implying that the complexity 
of Step 2 is N. Finally, since the expected size of L 2 is N , for each suggestion 
of Kq we expect a single match in Step 3.(c), and thus the time complexity of 
Step 3 is AT, as claimed. In total, the analysis shows that the time complexity of 
the full attack is TV, and its memory complexity is N as well. 


Step 1: For all i 



' T" 

Store in L\ 

Step 2(a): For a given A s for all j 



Find in Li; Deduce suggested K 0 ; Store in L 2 


Step 2(b): For a given A s for all i 



Deduce suggested Kq ; Find in Find suggested K\ 


Fig. 5. The Multibridge Attack 


3.4 Our Generalized Multibridge Attack on 4-Round Iterated 
Even-Mansour with Two Alternating Keys 

Given D < N data, we do not expect to have a plaintext-ciphertext pair that 
satisfies the internal fixed-point property. In order to generalize the attack for 
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any D < TV, we first notice that the internal fixed-point property X\ = X\ can 
be replaced by the more general “bridging” property X\ = ® A, for any fixed 

known value o0 A (the previously described fixed-point property is the special 
case of A = 0). Thus, in Step 2.(b) we calculate Y$ = Yf ® A, and similarly in 
Step 3.(b) we calculate = Z[ ® A. 

When we fix one value of A, we expect to have a pair ( P\C l ) such that 
X\ = X\ ® A with probability of about D/N. Thus, in order to recover the key 
with high probability, we randomly choose N/D different values of A, indexed 
by A s , and run a variant of the fixed-point multibridge attack independently for 
each value. This is a similar approach to the one used in D3H in order to extend 
the SlideX attack on 1-round 2-key EM to all P < TV 1 / 2 . The details of the 
generalized multibridge attack are given below: 

1. For each of the P plaintext-ciphertext pairs (P 2 , C l ): 

(a) Calculate P l ® C 2 , and store it in a sorted list Pi, next to PC 

2. For N/D arbitrary values of A s : 

(a) Apply a variant of the basic multibridge attack using A s . 

As we execute a variant of the fixed-point attack N/D times, the expected 
time complexity of the attack is TV 2 /P. The size of the list Pi is P, implying 
that the size of P 2 (the second list allocated in the multibridge attack) is D as 
well, and thus the memory complexity of the attack is D. 

We conclude by noting that this attack can also be applied directly to the 
attack of Merkle and Heilman against 2K3DES p~4] . The resulting attack is es- 
sentially the known plaintext variant of van Oorschot and Wiener [29] to Merkle 
and Heilman’s attack, i.e., an attack on 2K3DES with D known plaintexts and 
running time of N 2 / D. 


3.5 Application to 4-Step LED-128 

LED is a 64-bit lightweight iterated EM block cipher, proposed at CHES 2011 
p~3] . The cipher has two main variants: a one-key version called LED-64, and a 
two-key version called LED- 128. We concentrate on the 128-bit variant, which 
has 12 steps, in which the two keys are alternately used. The best previously 
known attack on 4-step LED- 128 was described in [23] (and also described in 
Section lT2l for a general 4-step EM cipher with alternating keys), and gives a 
tradeoff of TP = 2 128 , but only for P < 2 32 . We can directly apply our improved 
attack, described in Section El to 4-step LED-128, we obtain the tradeoff of 
TD = 2 128 for any D < 2 64 . Thus, we improve the time complexity of the best 
known attack on this scheme from 2 96 to 2 64 . 

We note that recently, up to 8 steps of the 2-key alternating EM scheme have 
been attacked faster than exhaustive search (see [9]). However, all the known 


7 Thus, we do not exploit the actual fixed-point in a strong way (such as in pQ), but 
merely some fixed linear relation between X\ and X\. 
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attacks on more than 4 steps are marginal in the sense that they improve the 
time complexity of exhaustive search only by a logarithmic factor in TV, and 
thus our new attack on the 4- step version of LED- 128 is currently the best 
non-marginal attack on this scheme. 


3.6 Application to Reflection Cryptanalysis 

Reflection cryptanalysis was introduced by Kara in m as a self-similarity attack 
on GOST and related block ciphers, and generalized to a statistical attack on 
a broader class of ciphers (called “PRINCE-like” ciphers) by Soleimany et al. 
in [28]. A PRINCE-like cipher is designed to have a specific symmetry property 
around its middle round, called o-reflectionH The definition and analysis of 
PRINCE-like ciphers in [28], was inspired by the block cipher PRINCE [5], that 
used the (^-reflection property in order to realize decryption on top of encryption 
with a negligible additional cost. 

In reflection cryptanalysis of PRINCE-like ciphers, we consider the encryp- 
tion process of a single plaintext, and study the difference between its internal 
encryption values, which are symmetric with respect to the middle round of the 
cipher. The goal is to iteratively construct a reflection distinguishes which is a 
strong non-random property, likely to be present in several rounds of PRINCE- 
like ciphers (as shown in [28]). In particular, a reflection distinguisher on r 
rounds of the cipher (denoted by Ek), gives a specific value of A for which 
Pr(X 0 Ek(X) = A) > 2~ n (where the probability is taken over the input X). 

In this section, we present a variant of the multibridge attack as a generic key- 
recovery method for reflection cryptanalysis. This attack can be considered as 
the reflection cryptanalysis counterpart of the key-recovery attack of Daemen [6] 
for differential cryptanalysis of ciphers based on the Even-Mansour construction. 
The attack assumes that we have a reflection distinguisher with probability p > 
2 ~ n on r rounds of the cipher, and recovers the secret key for a total of r + 2 
rounds, by adding one round at the beginning and one round at the end (i.e., the 
reflection distinguisher covers rounds 2,3,...,r + l). For the sake of simplicity, 
we first assume that the cipher is a single-key iterated Even-Mansour scheme, 
where the secret key is denoted by K. We now describe the attack, assuming 
that we obtain D plaintext-ciphertext pairs, such that D > p _1 . 

1. For 2 n / (p • D ) arbitrary values of Yq : 

(a) Compute Y( = F\(Y^). 

(b) Assume that T r J +1 = Y( 0 A (where the value of A is given by the 
reflection distinguisher), and compute F/ +2 = Fr+2(YSl). 

(c) Store Yq 0 F/ +2 in a sorted list L, next to Yq . 

2. For each of the D plaintext-ciphertext pairs (P 2 , C l ): 

(a) Compute E0C, and search the list L for matches. 


If we denote by Ek the encryption of r rounds in the middle of the cipher under 
the key A, then the o-reflection property (for a fixed value of a) states that for any 
input X,E K {X) = EJ c % a {X). 
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(b) For each match obtain Yq , and calculate a suggestion for A = P l ® Yq . 

(c) Test the suggestion for the key A using a trial encryption, and if it 
succeeds, return it. 

We have D > p~ x plaintext-ciphertext pairs, out of which p • D > 1 are 
expected to satisfy the reflection characteristic. As we evaluate 2 n / (p • D ) values 
of in Step 1 of the attack, according to the birthday paradox, we expect at 
least one match between Yq and P 2 0 A such that (P 2 , C l ) satisfies the reflection 
property. Once we obtain such a match (i.e., Yq = P©if), we recover the correct 
key in Step 2.(c). 

As we expect less than one match in L in Step 2. (a) for an arbitrary (P 2 , C l \ 
the time complexity of the attack is max(D,2 n / (p • D )). The time complexity 
is minimized to 2 n / 2 • p -1 / 2 by choosing D = 2 n / 2 • p -1 / 2 (note that it is not 
reasonable to exploit more than 2 n / 2 -p -1 / 2 data). The memory complexity of 
the attack is 2 n / (p- D), but can be easily reduced to P, by exchanging the order 
of steps 1 and 2 of the attack. 

In order to apply the attack to more complex key schedules, the attacker 
can exploit the internal properties of the reflection distinguisher to recover more 
key material (perhaps using more data, or function evaluations in Step 1 of the 
attack). However, this extension is highly dependent on the internal properties 
of the cipher, and is thus out of the scope of this paper. 


4 Classification and Summary of Our Attacks on All 
4-Round 2-Key Iterated Even-Mansour Schemes 

In the rest of the paper, we analyze all the remaining iterated EM schemes 
with 4 rounds and 2 keys, and show that the best attack on each one of them 
has a time complexity of N. We begin by noting that each such construction 
can be described by a sequence of 5 keys, which specifies the order in which 
the keys Ao and K\ are added (over GF( 2)) to the internal state. For exam- 
ple, we denote the 4-round EM scheme with alternating keys (of Fig. [2]) by 
[Ao, Ai, Ao, A i, Kq\. Clearly, each such scheme has an equivalent representation 
which is obtained by renaming the keys Ao and K\ (e.g., [Ao, Ao, Ai, Ai, Ao] is 
equivalent to [Ai, Ai, Ao, Ao, Ai]). In addition, since our attacks assume that 
the public permutations F{ (and F p 1 ) are chosen at random (i.e., we do not 
exploit any special properties of the public permutations), from a cryptana- 
lytic point of view, the roles of encryption and decryption can be exchanged. 
Namely, if we reverse the order in which the keys are added, we get an equiv- 
alent scheme. For example, the scheme [Ao, Ao, Ai, Ai, Ao] is equivalent to 
[Ao, Ai, Ai, Ao, Ao], since any attack on [Ao, Ao, Ai, Ai, Ao] can also be ap- 
plied to [Ao, Ai, Ai, Ao, Ao] (by reversing the roles of encryption and decryp- 
tion), and vice-versa. Altogether, the scheme [Ao, Ao, Ai, Ai, Ao] belongs to an 
equivalence class (EC) with 4 members, containing the 3 additional schemes 
[Ax, A 1? A 0 , A 0 , Ax], [A 0 , Ax, A 1? A 0 , A 0 ] and [A 1? A 0 , A 0 , A 1? K\\. Since any 
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attack on a member of an EC is applicable to its other members, we only need 
to describe an attack on a representative of the EC. 

Table [2] lists the equivalence classes of all the 4-round 2-key iterated EM 
schemes, next to the complexities of our best attacks. For the sake of simplifi- 
cation, we will refer to each EC as a single scheme, using its ID as described in 
Table [2] For example, our attack on the schemes of the first EC is simply refereed 
to an attack on the “EC1 scheme” , whose representative is [Kq, Ki, K\, K\, K\\. 

The attack on EC7, which is 4-round EM with alternating keys, was already 
described in Section lT4l In the next section we present the most complex multi- 
bridge attacks on the classes EC8 and EC9. The simpler attacks on EC1-EC6 
are presented in the full version of this paper [8] . 


Table 2. Classification and Attacks on Iterated Even-Mansour Schemes with Four 
Rounds and Two Keys 


EC ID 

EC Representative 

Reference 

Data 

Time 

Memory 

EC1 

[K 0 ,Ki,Ki,Ki,Ki] 

0 

0(1) 

N 

0(1) 

EC2 

[Ko,Ki,Ko,Ko,K 0 ] 

m 

0(1) 

N 

0(1) 

EC3 

[KcKcK^KcKo] 

m 

0(1) 

N 

0(1) 

EC4 

[Ko,K 0 ,Ki,Ki,Ki] 

m 

0(1) 

N 

N 

EC5 

[Ko, Ki, K\, Ko, Kq\ 

m 

0(1) 

N 

N 

EC6 

[K 0 ,Ki,Ki,Ki,Ko\ 

m 

D < N 

N 2 /D 

D 

EC7 

[Ko, Ki, Ko, Ki,Ko] 

Section [3[4 

D < N 

N z / D 

D 

EC8 

[K 0 ,Ki,Ko,Ki,K{\ 

Section 15.1 

D < N 

N z /D 

D 

EC9 

[Ko,Ki,Ko,Ko,K{\ 

Section 15.2 

D < iV 1/2 
N 1/2 < D < N 

n 2 /d 

n 2 /d 

D 

N 


Each EC (equivalence class) is described using an ID and a representative scheme. 


Classification and Attacks on All 3-Round 2-Key Iterated Even- 
Mansour Schemes. We did not find any cryptanalytic techniques which are 
specifically applicable to 3-round 2-key EM schemes. However, for the sake of 
completeness, we also classify all 3-round 2-key iterated EM schemes and spec- 
ify which variant of our 4-round attacks can be used to break it (with the same 
complexity parameters). 

1. [Kq, K\, K\, Ki\ and [Ko, Ah, Kq, Kq\ can be broken with a variant of the 
attack on EC1. 

2. [Ko, K i, K\, Ko\ can be broken with a variant of the attack on EC4. 

3. [Ro, K i, Kq, Ki] can be broken with a variant of the attack on EC7. 


5 Multibridge Attacks on EC8 and EC9 

In this section we consider the schemes EC8 and EC9, and show that they can 
be attacked with complexity DT = A 2 , for all D < N. The attacks on these 
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schemes use the same general multibridge technique as our previous attack on 
EC 7 in Section [3j namely, we use a generalized version of the internal fixed- 
point property in order to internally bridge different parts of the cipher. Finally, 
the suggestions for the key obtained from the two parts are merged using a 
meet-in-the- middle technique. 

5.1 A Multibridge Attack on EC8 

In order to attack the scheme [Kq, K\, iCo, K\, Pi], we look for a plaintext- 
ciphertext pair (P\C l ) such that X\ = P l 0 A s (for arbitrary values of A s ). 
The details of the multibridge attack on EC8 are given below: 

1. For N/D arbitrary values of A s : 

(a) For each of the D plaintext-ciphertext pairs (P 2 , C l ): 

i. Assume that X\ = P 2 © A s and compute X\ = F^X^)- 

ii. Compute X\ ® C l and store it in a sorted list Pi, next to C l . 

(b) For each of the N possible values of Y// : 

i. Compute Yl = F^K/). 

ii. Compute Y£ ® Yl , and search for matches in L\. 

iii. For each match, obtain C 2 , compute a suggestion for K\ = C l ® F/ , 
and store the suggestion in a sorted list P 2 , next to P l . 

(c) For each of the N possible values of Z q: 

i. Compute Z[ = 

ii. Assume that = Z$ ® A 3 , and compute Z[ = F 2 _1 (Z 2 ). 

iii. Compute a suggestion for K\ — Zf ® Z[ and search for it in the list 

a 2 . 

iv. For each match, obtain P 2 , compute a suggestion for Ko = P l ® Zq. 

v. Test the full key (Ko, K\) using a trial encryption, and if it succeeds, 
return it. 

The analysis of the attack is very similar to the analysis of our general multi- 
bridge attack in Section 13.41 and thus given D < N known plaintext-ciphertext 
pairs, its time complexity is N 2 / D and its memory complexity is D. 


5.2 A Multibridge Attack on EC9 

In order to attack the scheme [Ko,K\,Ko,Ko,K\\, we look for a plaintext- 
ciphertext pair (P l ,C l ) such that X\ — C l ® A s (for arbitrary values of A s ). 
The details of the multibridge attack on EC9 are given below: 

1. For N/D arbitrary values of A s : 

(a) For each of the D plaintext-ciphertext pairs (P 2 , C l ): 

i. Assume that X\ = C l ® A s and compute Xq = F// 1 (X {). 

ii. Compute a suggestion for Ko = Xq ® P l and store it in a sorted list 
Pi, next to X\. 

(b) For each of the N possible values of Y / : 
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i. Compute Yl = F 2 {Yi). 

ii. Assume that Y[ = Y{ 0 A s and compute F 3 J = F^ 1 (Y^). 

iii. Compute Y? 0 Yo and store this value on a sorted list L 2 , next to 
Y( and Y*. 

(c) For each of the TV possible values of 

i. Compute = F^Z^)- 

ii. Compute 0 and search for it in the list L 2 . 

iii. For each match, obtain Y^ (and F/), compute a suggestion for Kq = 
Yl 0 Zj, and search it in the sorted list L\. 

iv. For each match, obtain X\ and compute a suggestion for K\ = X\ 0 

0 

v. Test the full key (iVo, K\) using a trial encryption, and if it succeeds, 
return it. 

Similarly to the multibridge attacks on EC7 and EC8, the time complexity 
of the attack is N 2 / D for any D < TV, as the time complexity of each of the 
Steps E(a), l.(b) and l.(c) is N. However, unlike the previous attacks which had 
a reduced memory complexity of D, the list L 2 contains N elements, and thus the 
memory complexity of this attack is N. As a result, when D < TV 1 / 2 , the most 
efficient attack on this scheme is the generalized version of the attack presented 
in Section EJ which has the same running time but requires less memory. 

We note that in cases where D > TV 1 / 2 , but the available memory M satisfies 
D < M < TV, it is possible obtain a tradeoff between the memory and time 
complexities of the attack. Although in this paper we mainly consider tradeoffs 
between data and time, an interesting open question is whether it is possible to 
reduce the memory complexity of the attack for D > TV 1 / 2 without increasing 
its time complexity. 

6 Conclusions and Open Problems 

In this paper, we studied the security of iterated Even-Mansour schemes with two 
keys. We showed that all such schemes with at most 4 rounds provide security 
of at most 2 n (compared to the 2 2n complexity of exhaustive key search). Our 
theoretical results allowed us to reduce the complexity of the best known attack 
on 4-step LED- 128 from 2 96 to 2 64 , and to develop a generic key-recovery tool 
for reflection cryptanalysis. In order to obtain these results, we developed the 
novel multibridge technique which combines the advantages of the dissection [7 
and the splice- and-cut [3] techniques. 

We conclude this paper with a list of several open problems and research 
directions which arise naturally from the results of our paper. 

1. Finding Better Attacks on 3-round EM with Two Keys. Using our 
techniques, we could not find attacks on 3-round EM with alternating keys 

which are better than the attacks on 4-round EM with alternating keys. If 
such attacks indeed do not exist, then there is no security gain in adding 
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a round to the 3-round EM scheme. Such a situation is somewhat unusual, 
and hence, one may anticipate that better attacks exist on 3-round EM 
with alternating keys. We note that this is a similar scenario to cascade 
encryption, where the complexity of the best attack on 3-encryption is the 
same as the complexity of the best attack on 4-encryption [?.. However, in 
cascade encryption, the complexities are equal only for the specific attacks 
that minimize the time complexity, while in our case, the complexities are 
the same for all attacks on the tradeoff curve. 

2. Finding the Minimal number r for which r-round EM with Two 
Keys Provides 2n-bit Security. This is an interesting research direction 
whose equivalent has been extensively studied in the domain of Feistel con- 
structions (see [20125126] b In the case of EM with two keys, we are not aware 
of any attacks on the 5-round alternating key scheme which improve over ex- 
haustive search by a significant factor. On the other hand, when considering 
relatively small (polynomial in n) improvements over exhaustive search, up 
to 8 rounds can be broken (see 0), but no attacks at all are known for r > 9 
rounds. Clearly, this fundamental question can be generalized to more keys, 
namely, what is the minimal number of rounds for which ran-bit security can 
be achieved for n-bit iterated EM constructions with m independent keys? 

3. Other Attack Models. In this paper, we concentrated on attacks in the 
most conservative model in which the adversary has access only to known 
plaintexts, and the complexity of the attack takes into consideration all op- 
erations (including a potential preprocessing stage). It would be interesting 
to see whether the complexities of the attacks can be reduced in other mod- 
els, where chosen or even adaptively chosen plaintext queries are allowed, 
and perhaps precomputation is not counted in the overall complexity of the 
attack. We note that in a recent work of Joux and Fouque [EJ , such im- 
proved attacks were found for the 1-round EM construction with two keys, 
suggesting that similar results may be possible for iterated EM with two 
keys as well. 

4. Considering Memory Complexity. As in all previous papers on iterated 
EM, we concentrated in this paper on tradeoffs between data and time com- 
plexities, assuming that we always have enough memory to apply the most 
efficient attack. It would be interesting to consider more general tradeoffs 
between data, memory and time complexities, and in particular, minimize 
the memory complexity for which the (presumably) optimal curve DT = 2 2n 
can be obtained. We note that a similar question with respect to 1-round 
EM was asked in m and partially answered in |12j. 

5. More Complex Key Schedules. As stated in the introduction, iterated 
EM schemes can be considered with a wide variety of key schedules, gener- 
ating an endless field of research. However, even when restricted to schemes 
with two keys as we do in this paper, one may consider more complex key 
schedules in which combinations of the keys K 0 and K 1 can be used as round 
keys. It seems that the attacks presented in this paper cannot target such 
key schedules, and for example, we could not find an attack of complexity 
2 n on 4-round EM with the keys [K0,K1,K0,K1,K0®K1]. Hence, it will 
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be interesting to find new techniques that will be able to handle such key 
schedules, or to show lower bounds on the security of the respective iterated 
EM schemes. 
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Abstract. We show key recovery attacks on generic balanced Feistel 
ciphers. The analysis is based on the meet-in-the-middle technique and 
exploits truncated differentials that are present in the ciphers due to 
the Feistel construction. Depending on the type of round function, we 
differentiate and show attacks on two types of Feistels. For the first type, 
which is the most general Feistel, we show a 5-round distinguisher (based 
on a truncated differential), which allows to launch 6-round and 10-round 
attacks, for single-key and double-key sizes, respectively. For the second 
type, we assume the round function follows the SPN structure with a 
linear layer P that has a maximal branch number, and based on a 7- 
round distinguisher, we show attacks that reach up to 14 rounds. Our 
attacks outperform all the known attacks for any key sizes, have been 
experimentally verified (implemented on a regular PC), and provide new 
lower bounds on the number of rounds required to achieve a practical 
and a secure Feistel. 

Keywords: Feistel, generic attack, key recovery, meet-in-the-middle. 


1 Introduction 

A Feistel network m is a scheme that builds n-bit permutations from smaller, 
usually n/2-bit permutations or functions. In ciphers based on the Feistel net- 
work, both the encryption and the decryption algorithms can be achieved with 
the use of a single scheme, thus such ciphers exhibit an obvious implementation 
advantage. The Feistel-based design approach is widely trusted and has a long 
history of usage in block ciphers. In particular, a number of current and former 
international or national block cipher standards such as DES |61 , Triple-DES [19| , 
Camellia [2], and CAST [5J are Feistels. In addition to the standard block ciphers, 
the Feistel construction is an attractive choice for many lightweight ciphers, for 
instance the recent NS A proposal SIMON [3], LBlock [26] . Piccolo [24], etc. The 
application of the Feistel construction is not limited only to ciphers, and has 
been used to design other crypto primitives: the hash function SHAvite-3 [4], 
the CAESAR proposal for authentication scheme LAC [27] and others. 

The analysis of Feistel primitives and their provable security bounds depend 
on the type of the round function implemented. Luby and Rackoff m have 
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shown that an n-bit pseudorandom permutation can be constructed from an n/2- 
bit pseudorandom function with 3-round Feistel network. In this construction, 
the round functions are chosen uniformly at random from a family of 2 n / 2 ‘ 2 ™ /2 
functions - a set that can be enumerated with n/2 • 2 n / 2 -bit keys. Later, Knud- 
sen ||20< considered a practical model, in which the round functions are chosen 
from a family of 2 k functions and showed a generic attack on up to 6 rounds. 
Knudsen’s construction was coined as Feistel- 1 by Isobe and Shibutani in [18] 
to reflect the fact that it is the most general type of Feistels. They further in- 
troduced the term Feistel-2 to denote ciphers in which the round functions are 
composed of an XOR of a subkey followed by an application of a public function 
or permutation. Generic attacks on Feistel-2 such as impossible differentials [20] . 
all- subkey recovery mm, and integral-like attacks [25] penetrate up to 6 rounds 
when the key size equals the state size, and up to 9 rounds when the key is 
twice as large as the block. Better attacks have been published, but they are on 
so-called Feistel- 3 that has round functions based on substitution-permutation 
network (SPN), i.e. the rounds start with an XOR of a subkey, followed by a 
layer of S-Boxes and a linear diffusion layer. The attacks on Feistel-3 presented 
in [T8| reach up to 7 rounds for equal key and state sizes, and 11 rounds for twice 
larger keys. 

We present attacks on Feistel-2 and Feistel-3 ciphers based on the meet-in-the- 
middle cryptanalytic technique. Its most basic form corresponds to the textbook 
case of Double-DES [22] and in the past few years, a few improvements have 
been proposed to more specific cases, for instance, Dinur et al. im have gen- 
eralized the attack on Double-DES when multiple encryption (more than two 
n-bit keys) is used. Besides the applications to preimage attacks on hash func- 
tions [T1ITB1I23] . a notable application of the meet-in-the-middle technique and a 
line of research that has been started by Demirci and Selcuk [5 are the attacks 
on the Advanced Encryption Standard (AES). They presented cryptanalysis of 
AES- 192 and AES-256 reduced to 8 rounds by improving the collision attack due 
to Gilbert and Minier m and with the use of the meet-in-the-middle technique. 
Later, their strategy has been revisited by Dunkelman, Keller and Shamir [12], 
and most recently further improved by Derbez, Fouque and Jean mm- In this 
advanced form, the attack combines both the classical differential attack and 
the meet-in-the-middle strategy. In the differential attack, a high-probability dif- 
ferential is used to detect statistical biases to deduce information on the last 
subkey used in a block cipher. The attacker detects correct subkey guesses by 
checking meet-in-the-middle equations during the encryption process. Namely, 
the attack starts with a precomputation phase which is used to fully tabulate the 
distinguishing behavior particular to the targeted cipher, e.g. AES, and later in 
the online phase, the attacker searches for messages verifying the distinguisher 
by checking the precomputed table. 

Our Contributions. We show the best known generic attacks on Feistel-2 
and Feistel-3 cipher constructions. Our analysis, and a preliminary step of the 
attacks, relies on a special differential behavior of several consecutive rounds 
that is inherited by the generic Feistel construction. This property can be seen 
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as a distinguisher, and for Feistel-2 it extends to 5 rounds, while for Feistel-3 to 
7 rounds. The attacks exploit the distinguishes, and by adding rounds before, 
in the middle, and after the distinguisher, they can penetrate higher number of 
rounds. The distinguisher allows the differential behavior of the Feistel rounds 
to be enumerated offline and without the knowledge of the actual subkeys. This 
in fact is the first step of our attacks: a precomputation phase used to create 
a large look-up table. The next step is the collection of a sufficient number of 
plaintext /ciphertext pairs, some of which will comply with the conditions of the 
distinguisher. Each such pair suggests candidates for the round subkeys, and 
the look-up table is used to filter the correct subkeys. This step is indeed the 
meet-in-the-middle part of the attack. 

In the case of the Feistel-2 construction, the number of rounds that our attacks 
can reach depends on the ratio of key to state sizes k/n: the larger the ratio, 
the more rounds we can attack. Namely, 4s + 2 rounds can be attacked for 
k/n = (s + l)/2, which translates to 6 rounds when k m n, 8 rounds for k = 3n/2, 
10 for k = 2 n, etc. As long as the ratio is increasing, the number of attacked 
rounds will grow. This property comes from the meet-in-the-middle nature of 
the attacks, i.e. when we increase the key by bit size equivalent to one Feistel 
branch (and thus allow the complexity of the attack to increase by this amount), 
then we can add one round to the distinguisher in the offline phase, and prepend 
one round in the online phase. Since the attack relies on the meet-in-the-middle 
strategy, the complexities of these two phases are not multiplied but simply 
added, hence the accumulative complexity remains below the trivial exhaustive 
key search. In the analysis of Feistel-2, regardless of the number of attacked 
rounds, we make no assumptions on the type of the round functions: they can 
be any invertible or one-way functions or permutations, unique for each round. 
What we assume, however, is that the round functions have standard differential 
behavior. That is, given a large set of input-output differences of these functions 
(which can be seen as a set of differentials), on average for each differential there 
is one solution that conforms to it. 

For the Feistel-3 construction and a linear diffusion layer P with maximal 
branch number, we can attack up to 14 rounds of the ciphers when the key is 
twice as large as the state (k = 2 n), while for smaller keys we have attacks on 
12 and 10 rounds, for key sizes k = 3n/2 and k = n, respectively. The above 
generalization (the number of attacked rounds always increases when the key 
size increase) is no longer possible as the data complexity grows beyond the full 
codebook when key size is more than 2 n bits. To reach more rounds compared 
to Feistel-2, we use the SPN structure of the round function in both the offline 
and online stages of the attack. The best such example given in the paper is 
the redefinition of the Feistel-3 by moving the linear layer from one round to 
the surrounding rounds: this allows to extend the attack by an additional round. 
Other improvements based on the SPN structure are better (in terms of number 
of rounds) distinguisher and key recovery. For the main Feistel-3 attacks, we 
assume that the P-layers of all rounds are the same, but in case they are different, 
we show that the attacks can be adapted on only one round less. 
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Table 1 . Comparison of previous results and ours for n-bit block- length, /c-bit key- 
length and c-bit S-Box length 


Target 

Round 

grounds and complexity 

Reference 

functions 

k = n 

k = 

3n/2 

k 

es 2n 


bijective 

5 

23n / 4 

6 

2 n 

7 

23n / 2 

m 


— 

3 

2 Tl / 2 

5 

2 n 

7 

23n / 2 

m 

Feistel-2 
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5 

2 n/2 

7 

25n/4 

9 

^3n/2 

m 


bij., ident. 

6 

2 n/2 

— 

— 

— 

— 

m 


— 

6 

23n/4 

8 

24n/3 

10 

2lln/6 

Section \S\ 


— 

7 

23n/4+c 

9 

2 n+c 

11 

2jn/4:-\-c 

m 

Feistel-3 

— 

9 

2'n/2+4c 

11 

f^n-\-4c 

13 

23n/2+4c 

Section [4] 


identical 

10 

2ro/2+4c 

12 

2^+40 

14 

23n/2+4c 

Section [4] 


Our analysis results in a recovery of the whole values (not only partial values 
or bytes) of certain subkeys. This is the main advantage of the attack, and by 
repeating it a few times, we can recover one by one all the subkeys and thus 
be able to encrypt and decrypt without the knowledge of the initial master key. 
Hence, the key schedule plays no role in the analysis and the attacks are in fact 
an all-subkey recovery. We have also experimentally confirmed the validity of our 
analysis on the case of small state Feistel- 20. The experiments ran on a regular 
PC supported the complexity evaluation and the correctness of the attacks. All 
of the results described in this paper are summarized in Table [T| and compared 
to the already-published generic analysis on Feistel-2 and Feistel-3. 

Due to space constraints, in the sequel, we present only our main ideas that 
result in 6-round attack on Feistel-2 and 10-round attack on Feistel-3. The full 
version of the paper, including additional attacks, the technique to recover all 
the subkeys and the experimental results can be found in HE- 

2 Preliminaries 

Throughout the paper, we assume that the block size is n bits and the Feis- 
tel is balanced, thus the branch size is n/2 bits. The internal state value (the 
branch) is denoted by Vi and the n-bit plaintext is assigned to uo||u_i. We 
count the rounds starting from 0, and at round i, Vi+i is computed as Vi+i <— 
RoundFunction(ui, t^_i, FQ). The round function depends on the class defined 
further, i.e. it is either Feistel-2 or Feistel-3. In the description of the attacks, we 
omit the network twist in the last round as it has not cryptographic significance. 

Generic Feistel-2 Construction. A Feistel-2 round function consists of a 
subkey XOR and a subsequent public function as illustrated in Figure [lj Several 

1 The interested reader can find the implementations of our attacks at 

http : //wwwl . spms . ntu . edu . sg/~ syllab/ attacks/F2-6rounds . tar . gz and 
http : / / wwwl . spms . ntu . edu . sg/~ syllab/ attacks/F2-8rounds . tar . gz 
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Vi - 1 


77/2 



Vi+1 Vi 

Fig. 1. Feist el- 2 


v i+ i Vi 

Fig. 2. Feist el- 3 


Fig. 3. Simplified Feistel-3 


classes of public functions can be considered. Typical classifications are bijective 
or non-bijective, invertible or non-invertible, and different functions for different 
rounds or an identical function for all rounds. 


Generic Feistel-3 Construction. A Feistel-3 round function consists of a 
subkey XOR, an S-layer, and a P-layer. The S-layer performs word- wise S-Boxes 
applications, while the P-layer performs a linear operation for mixing all words. 
Several classes of S-layers and P-layers can be considered. An example of the 
classification of the S-layer is different S-Boxes for different words or an identi- 
cal S-Box for all words. The P-layers can be classified according to the branch 
numbeJl of the linear transformation used in the layer. In our analysis, if c is 
the bit size of a word, then the internal state value has n/2c words, and we 
assume that the branch number of the linear operation in the P-layer is n/2c + 1, 
i.e. it is maximal. For example, a multiplication by an MDS matrix produces 
the maximal branch number of n/2c+ 1. The Feistel-3 construction is shown in 
Figure [2j We often use the simplified description given in Figure [3j 

Solutions of Differential Equations. In our analysis, we make the follow- 
ing assumption on the non-linear round functions F{ of the Feistel cipher. We 
assume that given a large set of fixed input and output differences of F*, i.e. 
(Z\j. , Aoj),j = 1, 2, . . ., then on average there is one solution of each of the dif- 
ferential equations Fi(X ® Aj j ) ® Fi{X) = Ao^j = 1,2, That is, some of 

the equations may have many solutions and some none, however, we assume that 
on average (over a large set) the number of solution is one per equation. This 
requirement is sufficient for our analysis, as we solve the differential equations 
for a large number of (Z\j, Ao), thus we can take the average case which is one 
solution per equation. Our computer simulations of the attacks confirmed this 
expectation and the complexity of the attacks was as predicted by our analysis, 
in part because the aforementioned assumption is true in the case of randomly 
chosen (Feistel-2 and Feistel-3) non-linear round functions. There are examples 
of round functional where the assumption does not hold, for instance, linear 

2 The branch number of a linear transformation is the minimum number of active/non- 
zero input and output words over all inputs with at least one active/non-zero word. 

3 We do not claim attacks on Feistel-2 that have this type of round functions. 
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function^. However, to the best of our knowledge, such round functions are ei- 
ther not used as building blocks of ciphers, or they can be attacked using other, 
more trivial attacks. 

It is important to notice that although one solution is expected, it does not 
mean that it can be found trivially. To solve most of the equations, we use 
precomputation tables, i.e. we tabulate the functions, store their values, and 
later perform table lookups to solve the differential equations. 

Definition 1 (5-Set, [T]). A 5 -set for byte-oriented cipher is a set of 2 8 state 
values that are all different in 1 byte and are all equal in the remaining bytes. 

We introduce slightly modified definition (without byte-oriented sets). 

Definition 2 (6-5- Set). A b-S-set is a set of 2 b state values that are all different 
in b state bits (the active bits) and are all equal in the remaining state bits (the 
inactive bits). 

By this definition, the original Knudsen’s 5-set from |7 can be seen as an 8-5-set, 
since it takes all the values of a particular byte, which is an 8-bit value. To define 
6-5-set, we have to specify not only the value of 6, but also the position of the 
active bits. In some cases, however, the position is irrelevant and the analysis is 
applicable for any b active bits. 

Given a state value u, we can construct a 6-5-set from u, by applying 2 b — 1 
differences to some 6 bits of the state v. Furthermore, we can take a function F, 
order all the possible 2 b — 1 input differences, and obtain a sequence of output 
differences of F. An example of such sequence, when the active bits are the least 
significant bits, is F(v) ® F(v ® 1), F(v) ® F{v ® 2), . . . , F(v) ® F(v ® 2 b — 1). 

The Attack Model. The key-recovery attacks presented in the paper follow 
the standard attack model. That is, the key of the block cipher is secret and 
chosen uniformly at random. The attacker can query both the encryption and 
the decryption functions of the block cipher. His task is to recover the secret 
key (or the subkeys produced from the key schedule) based on the queries. We 
explicitly state that the attacker has no information about the internal state 
values of the block cipher. 

3 Key-Recovery Attacks against Feistel-2 Construction 

In this section, we present a key-recovery attack on 6-round Feistel-2 ciphers for 
the case when the key and the state sizes are equal, i.e. k = n. The extensions 
of the attack to 8 rounds for k = 3n/2, 10 rounds for k = 2n, and in general 
to (4 + 2s) rounds for k = n(s + l)/2, can be found in the full version of the 
paper [15]. In our attack, the round functions can be either bijective or non- 
bijective, i.e. permutations or functions, and they can even be one-way. To make 

4 For linear function, the probability that a solution exist depends on the size of the 
large set. 
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Fig. 4. 5-round differen- Fig. 5. b-5 - set construe- Fig. 6. 6-round key- 
tial characteristic tion recovery 

the attack applicable to the most general type of constructions, in the sequel, 
we assume that the round functions are one-way and pairwise distinct. 

We use Fi to denote the round function at round i of the construction. To 
refer to the input (resp. output) of Fi, we write F? (resp. Ff ? ). Similarly, the 
input difference (resp. output difference) of Fi is denoted by AFf (resp. AF p). 
Recall that the two branches, as well as the subkeys Ki, have n/2 bits each. 

The 6-round key-recovery attack is based on a non-ideal behavior of 5 rounds 
of Feistel-2, which is described by the lemma and the proposition that follow. In 
the 6-round attack (refer to Figure [6]), the last five rounds are the rounds where 
this distinguisher is used. 

Lemma 1. Let X and X' , where I /!( be two non- zero branch differences. 
If a 5-round Feistel-2 encrypts a pair of plaintexts (m, m') with difference 0||A to 
a pair of ciphertexts with difference 0||A', then the number of possible internal 
state values of the three middle rounds that correspond to the plaintext m is 
limited to 2 n / 2 on average. 

Proof. Note that n/2-bit round keys are added in each round, and hence the 
number of possible internal state values for the three middle rounds is limited 
by its size, 2 3n / 2 . We show, however, that the bound can be tightened to 2 n / 2 . 

A 5-round differential characteristic, with input difference 0|| A and output 
difference 0||A' is depicted in Figured] (the rounds are denoted from i + 1 to 
i + 5 to make this part of the analysis generic). From the figure, we can see that 
after the first round, the input difference (0, A) must become a state difference 
(A, 0). Similarly, after the inversion of the last round the output difference (0, A') 
becomes (0, X ). This makes AF^_ 3 to be X" A® A'. Since A ^ A', it follows 
that X" 0 and thus AFff 3 ^ 0 - let us denote this difference with A. It means 
that both AF° 2 and AF^ a also have the difference A. To summarize, we get 
that for each fixed A, the input and output differences of the round functions at 
rounds i + 2, i + 3, and i- f-4 are fixed. Therefore, there exists one state value (one 
solution) that satisfies such input-output difference in each of the three rounds. 
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As A can take at most 2 n ' 2 different values (one branch has nj 2 bits), the states 
in rounds i + 2, i + 3, i + 4 can assume only 2 n ' 2 different values. In Figure [4j 
the fixed value for each A is drawn by bold line. □ 

We use Lemma \l\ to prove the below proposition that will help us later to 
launch the attack on 6 rounds. To present the proposition, we need additional 
notations. Let F : m — >> F(m) be a 5-round Feistel-2 (we omit writing the key k 
as input) and let the function F A : {0, l}^ 1 — » {0, 1}§ be defined as F A {m , 5) = 
Trimc n / 2 ^F(m)®F(m® (0||<5))^ , where Trunc n / 2 denotes the truncation to the 

first n/2 bits. In other words, F A {m,S) gives the output difference (of the left 
branch) in the pair of ciphertexts, produced by encryption of a pair of plaintexts 
(m, m ® 0 1| <5) with the 5-round Feistel. Furthermore, instead of taking a single 
pair of plaintexts, let us create several pairs such that in each pair, the first 
element is always m, while the second is m ® 0\\Sj where Sj = 1, . . . , 2 b — 1 (the 
precise value of b is defined later in the section). In fact, we can see that the 
second elements of the pairs form a 6- £- sequence. The proposition given further 
claims that the sequence of differences in the ciphertexts pairs (that correspond 
to such plaintexts pairs) can take only 2 n / 2 values. 

Proposition 1. Let (m, m') be a pair of plaintexts that conforms to the 5-round 
differential characteristic given in Figure and let 5j = 1, . . . , 2 b — 1, b > 1 forms 
b-5 -sequence. Then, the sequence F A {m , Sj), Sj = 1, . . . , 2 b — 1 can assume only 
2 n / 2 possible values. 

Remark 1. We note that the sequence can be constructed from any of the two 
plaintexts m or ml given in Proposition |TJ as long as the pair (m, m') conforms 
to the differential characteristic. 

Remark 2. From a theoretical point of view, Proposition [T| yields a distinguisher 
since the number of functions reached by the 5-round Feistel-2 construction is 
much less than the theoretical number of functions from a set of 2 b elements 
to a set of 2 n / 2 elements when b > 1. Indeed, for a fixed m, the latter equals 
(2 n / 2 ) 2b = 2 2&n / 2 , whereas it is only 2 n / 2 in the case of the 5-round Feistel-2 
construction. 

Proof. The initial pair of plaintexts (m, m r ) is only used to compute the state 
values of the three middle rounds that correspond to the plaintext m. We have 
seen from Lemma Q] that these three states can take only 2 n / 2 possible values 
(each of them corresponds to one of the values of A). We will show that if the 
values of these three states are fixed, then we can change the right half of the 
plaintext (instead of m, we take m®0|| A/) and still be able to compute the output 
difference in the left half of the ciphertexts. In fact, we can change the value of the 
plaintext many times (i.e. we can produce many pairs of the form (m, m®0\\5j)), 
and for each of them, we can easily compute the output difference in the right 
halves of the ciphertext. The number of plaintexts pairs adds no complexity 
in predicting the ciphertext difference - once the three middle states are fixed 
(and they can have only 2 n / 2 different values), the sequence of differences in the 
ciphertext pairs is uniquely determined. 
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Assume the difference A is fixecjU and thus are fixed the three internal state 
values. Let 2,^+3, U+a be the input values to F i+2 , Fi +3 , Fi + 4 that correspond 
to the plaintext ra, in which G +2 , G+3, ti+4 are determined depending on A Let 
Vi be the values of the states that correspond to the plaintext m as shown in 
Fig. [5j Let us consider a new pair of plaintexts, (m, m® (0||5j)), i.e. we introduce 
a difference Sj to the right branch, i.e. Avi = Sj. Since the difference AF^ rl is 
always zero, we obtain that Avi+ 2 = Avi = Sj. In round i + 2, the attacker knows 
the value of F^_ 2 = G +2 and the difference AF^_ 2 = Hence, the new paired 
values of F^_ 2 are G +2 and G +2 ® Sj. Therefore, the new AF^_ 2 can be obtained 
as AF^ 2 4- F i+2 (G +2 ) ® Fi+ 2 (U+2 ® Sj). In Figure [5l we represent this type 
of computable difference with V. The new difference for AF^ 2 is propagated 
forward to Vi + 3 and the same reasoning as in round i + 2 is applied to round 
i + 3. As we know the value of F^_ 3 = ti + 3 and AF^_ S = AF!j^ 2 , it follows 
that (t i+ 3, t i+ 3 © AFf + 2 ) are the paired values. The new AF^_ 3 can therefore be 
computed as AF^_ 3 F i+ 3(t i+ 3)®F i+3 (ti +3 ®AF^_ 2 ). The knowledge of AF^_ 3 

gives the difference for 4 for the next round, namely: Avi+4 AFR_ 3 ®Sj. The 
analysis continues the same way for round i + 4. From the knowledge of the value 
of F ^_ 4 = ti + 4 and the new difference AF^ a = Avi+ 4, the output difference 
of the round function AF^_ 4 is computed, and finally Avi + 5 is computed as 
AF° 4 ®Av i+3 =AFf +4 (BAFf +2 . 

In summary, for an arbitrary Sj , we can compute the output difference Avi+ 5, 
i.e., the mapping from Sj to Avi + 5 becomes deterministic (as long as A is fixed). 
Therefore, for the ordered sequence of Sj that takes the values 1 , 2 ,..., 2 n / 2 — 1 , 
we can determine the sequence of corresponding differences Avi + 5 (which indeed 
is the difference in the left half of the ciphertext). We emphasize that the mapping 
depends only on values of G +2 , 3, 4, which in turn are determined from the 

value of A,X and X', and acts independently of the value of m. Since A takes 
at most 2 n / 2 values, the number of sequences of Avi + 5 is limited to 2 n ' 2 . □ 

6-Round Key- Recovery Attack. We prepend one round to the 5-round dis- 
tinguisher shown in Figure [4] and the resulting construction is illustrated in 
Figure [H The attack consists of precomputation and online phases. The online 
phase is further divided into collecting pair and key recovery phases. In the pre- 
computation phase, we choose many pairs (X, X'), where X is fixed while X' 
takes multiple values, and for each pair, we find all possible 2 n / 2 sequences of 
Av 5 based on Proposition [lj We store all the sequences in a large table along 
with its corresponding internal state values. Next, in the online phase, we collect 
many pairs that satisfy one of the differential characteristics (X, 0) — >> (X',0). 
Finally, for each of the obtained pairs, we compute Av$ sequences by guessing 
the first round key Kq. We then find a match of Av 5 sequences between the 
precomputed table and the one computed online - this allows us to determine 
the internal states and to recover Kq. The meet-in-the- middle nature of our at- 
tack comes from the fact that the Av 5 sequence is computed offline for the last 

5 Recall that this difference corresponds to an internal state difference for the plaintext 

pair (m, m). 
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five rounds and online for the first round, and the results are later matched in a 
meet-in-the-middle-like fashion. 

Precomputation. From Proposition [l] the number of possible sequences of 
Av 5 is 2 n / 2 for a fixed X and a fixed X'. We can achieve a time/memory tradeoff 
by relaxing the n/ 2 -bit constraint of a fixed X' and allow 2 X different possible 
differences for X', where 0 < x' < n/ 2 . Without loss of generality, assume that 
the values of X' differ in the last x' bits and are the same in the remaining n/ 2 —x' 
most significant bits (MSBs). In the sequel, we will determine the optimal value 
for x' to reach the best time/data/memory complexities for the attack. 

First, we show how to compute all 2 X • 2 n / 2 = 2 X + n / 2 sequences of 2 b 
differences as an offline precomputation in 2 X + n / 2 + h time (encryptions), and 
2 X + n / 2 + b memory (blocks of n /2 bits). This offline precomputation results in a 
table T<5, that contains all the sequences. Since the precomputation step is the 
same for all X' differences, further we show the procedure for a particular X' 
and assume that for the whole offline execution this procedure is repeated 2 X 
times for the possible values of X' differences. 

In rounds 2 and 4 , the input differences to the round functions are fixed to X 
and X', respectively, while both of the output difference are A. To reduce the 
time complexity, we first tabulate completely the round functions X2, X3 and 
X4 and thus we will have a constant-time access to paired values for some input 
or output differences. Namely, we construct precomputation tables X2 and Z4, 
which take the difference A as input and return the paired values conforming 
to the differentials X — )► A and X' — >> A through F2 and X4, respectively. The 
strategy consists simply in iterating over all possible inputs, and storing the 
results indexed by output difference as described in Algorithm [TJ 

Similarly, in round 3 we want to construct the table X3 that gives in constant 
time a paired- value input to X3 resulting in the fixed output difference X". 
However, since the function X3 is assumed to be one-way and in the attack 
we need to invert it, we cannot compute X 3 _1 to construct X3. Thus, we first 
evaluate X3 for all input values, store the values in a temporary table, and 
later consider the difference, as detailed in Algorithm [2] After this part of the 
precomputation phase, for an arbitrary fixed difference A (which is the difference 
AF® = AF3 = AF®), the corresponding state values in rounds 2 , 3 , and 4 can 
be looked up in tables T2,T3, and T4 in constant time. Hence, we can compute 
the b-S- set for all the 2 n ' 2 possible choices of A and store the resulting sequences 
in the precomputation table which later is used for the meet-in-the- middle 
check of the online phase. This step is described in Algorithm [ 3 j 

Finally, another table To of size 2 n / 2 is generated to make more efficient the 
online phase and the recovery of the subkey Kq. That is, in round 0 , for all values 
of Fq, the corresponding AF® is computed. Namely, for i = 0 , 1 , . . . , 2 n / 2 — 1, 
Xo(i) ® Fo(i ® X) is computed and stored in Tq. 
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As stated previously, we repeat this procedure for 2 X ' different choices of the 
difference X' . For the sake of simplicity, the resulting tables for each X' are all 
merged in the same table T$. For a fixed choice of X' , building To, T2, X3 and T4 
requires 2 n / 2 round function computations each. Hence, constructing T$ requires 
lesqj than 2 b • 2 n / 2 encryptions. The entire analysis is iterated over 2 X choices 
of X' so that the computational cost is less than 2 X + b + n / 2 encryptions. The 
memory requirement to build To, T2, T3 and T4 is 2 n / 2 blocks of n/2 bits, and is 
constant as we can reuse the memory across different X'. The size of T$ increases 
with the iteration of 2 X choices of X', namely, the memory requirement for the 
precomputation phase amounts to 2 b • 2 X + n / 2 = 2 X + n / 2 + b blocks of n/2 bits. 

Collecting Pairs. In the data collection phase, we query the encryption oracle 
with chosen plaintexts to get enough pairs such that one conforms to the whole 
6-round differential characteristic. To do so, we construct a structure of 2 n / 2+1 
plaintexts that consists of two lists of sizes 2 n / 2 . All the elements of the first list 
are fixed to a constant random value vq on their left half, while the right halves 
are pairwise distinct. The second list is constructed similarly, except that the 
left half is fixed to Vo ® X. As a result, we have 2 n pairs of plaintexts such that 
the difference in the left half equals X and the right half is nonzero. 

For a single structure, the data complexity corresponds to encryption of 2 n / 2+1 
chosen plaintexts, which can subsequently be sorted by their ciphertext values 
to detect the pairs that match on their left half (n/2 bits) and n/2 — x' most 
significant bits of the right half. Consequently, we expect one structure of plain- 
texts to provide 2 n /2 n / 2+n / 2-:E = 2 X pairs conforming to the truncated output 
difference, i.e. such that only the x' less significant bits of the right half are 
nonzero. To complete the attack, we need 2 n / 2 pairs, as the difference cancel- 
lation at the output of the first round holds with probability 2 -n / 2 . Hence by 
repeating the data collection for 2 n ^ 2 ~ x different values of uo, we can expect 
one pair among the 2 n / 2 to follow the whole characteristic. Therefore, the data 
complexity amounts to 2 n ! 2 ~ x x 2 n / 2+1 = 2 n ~ x +1 chosen plaintexts, requires 
the same amount of memory access as time complexity to be generated, and can 
be stored using only 2 n / 2 elements with the use of a hash table for the pairs 
that verify the truncated output difference. The whole procedure is described in 
Algorithm [4j 

Recovery of K 0 • The previous phase results in 2 n / 2 candidate pairs with a 
plaintext difference (X, Av- 1) and an appropriate ciphertext difference. For each 
pair, we match against the precomputed table To to find the corresponding value 
of Tq", and thus determine uniquely a subkey candidate for Kq by Kq 4 — vq ® Fq. 

However, among these 2 n / 2 candidates for Xq, only one is correct while the 
remaining are false positives. To find the correct subkey, we use the results of 
Proposition [l] and the precomputation table T$, i.e. we construct a b-S- set by 
modifying the active bits of vq. For each modified plaintext, with the knowledge 
of Xo, we compute the corresponding F® and modify u_i so that the value of v\ 
stays unchanged. Then, we query the plaintexts and observe the left half of the 


Less, as one evaluation of the round functions costs less than one encryption query. 
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Algorithm 1. Construction of the tables T 2 and X4 


1 

2 

3 

4 

5 


for i = 0, 1, . . . , 2 n / 2 — 1 do 

Compute AF° <— ^2(2) ® ® X). 

Store (z, AF®) in T2 indexed by AF® . 
Compute AF° F±(i) ® Ft(z ® X') 
Store (z, AF°) in T4 indexed by AF° . 


Algorithm 2. Construction of the table X3 

1 : for z = 0, 1 , . . . , 2 n/2 - 1 do 

2: Store (z,p3(z)) in a temporal table tmp indexed by Fkfz). 

3: for z = 0,1,..., 2 n/2 - 1 do 
4: Compute ^(z) ® X " . 

5: Look up tmp to obtain j such that F^{j) — Fz(i) ® X " . 

6: Store (z, z ® j) in T3 indexed by z ® j. 


Algorithm 3 . Construction of the sequences of Av 5 
1: for A = 1 , . . . , 2 n/2 - 1 do 

2: Obtain internal state values F}, F f and Ff by looking up T 2 , T 3 and T4, 

respectively. 

3: for all b active bits of the b- 5 - set do 

4: Modify Av 0, and compute the corresponding Av 5. 

5: Compute the sequence of Z\r>5 and add it to T§. 


Algorithm 4 . Data collection phase of the 6-round attack 

1: Choose 2 X differences X' so that the n/2 — x' MSBs of X' are 0 for all X ' . 
2: Choose a difference X such that I/I 7 . 

3: for 2 n / 2-x different values of vo do 
4: for all 2 n//2 choices of z;_i do 

5: Query (z;o,^-i) and store it in Lo sorted by the ciphertext value. 

6: Query (r>o ® A, v-i) and store it in L\ sorted by the ciphertext value. 

7: Pick up the elements of Lo x L\ whose ciphertexts match 

in the n — x most significant bits. 


corresponding ciphertexts. Hence, we can compute the sequence of Av$. If this 
sequence is included in the precomputation table X^, Kq is a correct guess with 
high probability, otherwise it is wrong. We note that this does not increase the 
data complexity, since the structures of plaintexts already includes the plaintexts 
for the b- 5 - set evaluation. 

Complexity Analysis. In the online phase of the attack, we perform 2 n / 2 
checks in the precomputed table T$ that contains all the possible stored sequences 
of differences. If we do not store enough information in this table (if b is too 
small), many checks will wrongly yield to valid subkey candidates Kq. On the 
other hand, if we store too much information (if b is too large), the table will 
require higher time and memory complexity to be constructed. Thus, we need 
to select an optimal value of b. One check yields a false positive with probability 
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2 n / 2 /2 n2b / 2 = 2 n ( 1 2b )/ 2 as there are 2 n / 2 valid sequences of 2 h elements among 
the 2 n2 / 2 theoretically possible ones. Therefore, we want n( 1 — 2 & )/2 + n/2 < 0 
so that among all the 2 n / 2 checks, only the correct results in a stored element, 
and thus b > 2. 

In terms of tradeoff, adjusting the value x' balances the data, time and mem- 
ory complexities. The data complexity is 2 n ~ x +1 chosen plaintexts, the time 
complexity is 2 X + n / 2 encryptions to construct T$ and 2 n ~ x +1 memory access 
to query the encryption oracle. The memory complexity is also 2 X + n / 2 blocks 
of n/2 bits required to store T$. Consequently, the choice of x' = n/4 makes 
the data complexity to become about 2 3n / 4 chosen plaintexts, the time complex- 
ity equivalent to about 2 3n / 4 encryptions, and the memory complexity to 2 3n / 4 
blocks of n/2 bits. 

4 Key-Recovery Attacks against Feistel-3 Construction 

In this section, we present a 10-round key-recovery attack on the Feistel-3 con- 
struction with k = n. In the attack, we assume that different S-Boxes are used 
for different words in a given round, but we consider they are the same across 
all of the rounds. Recall that all the S-Boxes operate on obit words, and thus 
there are ^ words per branch. We consider that the P-layer is identical for all 
rounds and it has the maximal branch number of ^ + 1. The extensions of the 
attack to 12 and 14 rounds for key sizes of k = 3n/2 and k = 2 n, respectively, 
and the analysis of a class of P-layers that not necessarily has a maximal branch 
number are given in the full version of the paper D3I- 

The 10-round key-recovery attack is based on a non-ideal behavior of 7 rounds 
of Feistel-3. We first present the 7-round distinguisher in the proposition below, 
and then use it to launch a key-recovery attack on a 10-round Feistel-3 primitive 
where the inner rounds are the ones from the distinguisher. To construct the 
distinguisher, we first apply an equivalent transformation to the 7-round primi- 
tive, as shown in Figure [TJ Namely, the P-layer of round i + 6 is removed from 
this round, and linear transformations are added to three different positions in 
order to obtain a primitive that is computationally equivalent to the original 
one. Hereafter, u' +7 represents the value of P _1 (t^ + 7). We use the non-ideal 
behavior of the new representation to mount the 10-round key recovery attack 
by extending the 7-round differential by one round at the beginning and two 
rounds at the end. The newly- introduced P after 7 is later addressed in the 
key-recovery part. 

As in the previous section, Ff and AFf denote the input value and input 
difference of the i-th round, respectively, that is the input to the S-layer in i^. 
Similarly, F- 4 and AF- 4 refer to the state value and state difference after the 
S-layer, that is between the S-layer and P-layer of P^, and F® and AFf* denote 
the output value and output difference of the P-layer in P*, respectively. For the 
branch- wise difference, we use 0 to refer to branch with no active words, 1 to the 
case when only a single pre-specified word is active, and V and P _1 for branch- 
wise differences obtained after 1 has been processed by P and P _1 , respectively. 
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b-8 - set 




Fig. 8. 10- round key- recovery for k — n 


Finally, X[l\ and Z\X[1], respectively, denote the pre-specified active- word value 
and difference of a branch- wise variable X. 

The technique used to construct the 7-round distinguisher (described in the 
proposition below) is very similar to the technique we have used in the distin- 
guisher on five rounds of Feistel-2. In other words, first we show that if a pair 
(m, m!) of plaintexts follows a particular differential characteristic, then the num- 
ber of possible internal state values that correspond to m is limited. Based on 
this, we can introduce a difference in the plaintext and predict the output differ- 
ence in the ciphertext. Again, we introduce many pairs of plaintexts where each 
right half differs on Sj (and thus get a 6-^-sequence) and observe that the pairs 
of ciphertexts have predictable difference. Unlike the proposition for Feistel-2 
where we observed the difference in the left half of the ciphertext, for Feistel-3, 
we check the difference in one word of the right half in the ciphertext pairs (the 
position of this particular word plays no role in the analysis). That is why we 
have to redefine F A (m , Sj). To avoid bulky notations, we define it informally as 
one-word difference in the right half of the ciphertext pair that are produced 
from the encryption of a plaintext pair (m, m ® 0\\5j) through 7-round Feistel-3. 
In Figure 0 this is the ciphertext difference in the word v' i+7 . 

Proposition 2. Let (m, m') be a pair of plaintexts that conforms to the 7-round 

differential (0, 1) — » (i, 0) shown in Figure\7\ and let Sj = 1, 2, ... 2 b — 1 forms a 
b-S -sequence. Then, the sequence F A {m , Sj), Sj = 1, . . . , 2 b — 1 can assume only 
2n/2+4c p 0SS ibl e values. 

Proof. We show here that the number of internal state values for pairs satisfying 
the 7-round differential in Figure 0 is at most 2 n / 2+4c . Namely, we show they 
can be parameterized by five nonzero differences in five c-bit words (marked by 
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circles in Figure [7J), and by the values of n/2 — c inactive bits of F?j_ 4 (marked 
by a star in Figure [7]) • 

We first assume that the five word differences circled in Figure 0 are fixed, 
AF% 2 , AF? +i , AF? +6 and AF% 6 are fixed to random nonzero 


that is: AF?j_ 2 , 

values. When AFf, 2 and AF^} 2 are fixed, we expect one value on average to be 
determined for 2 [ 1 ] * In Figure [3 the state in which the value is fixed only 
in one word is represented by dotted lines. Then, the corresponding AF^_ 2 = 
Avi + 3 = AF ^3 can be fully computed linearly by P{AF^ 2 ). Since the branch 
number of P is n/2c+ 1, P(AFi+ 2 ) is fully active. Similarly, when AF?j _ 6 and 
AF-^q are fixed, one value on average can be determined for F^_ 6 [ 1 ], and the cor- 
responding fully active difference Av {+ 5 = AFf^ can also be computed linearly 
by P(AF/+ 6 ). Then, AF^ 4 is computed by Avi+3 0 Avi+ 5 , where both Avi+ 3 
and Avi+ 5 are of type V. Since P is linear, AF^ 4 also has the form V, which 
plies that the form of AF^ 4 is P~ 1 (P) = 1 . Then, the middle difference AF/° +4 


un- 


is 


considered fixed. When AF^_ 4 7 ^ AF^_ 2 and AF^_ 4 7 ^ AF?j_ 6 , the corresponding 
differences AF^_ S and AF^_ 5 are computed by simply taking their XOR. Thus, 
both AFP +3 and AFg_ 5 are of type 1 , which makes AF/^ 3 and AF/+ 5 fully active 
(denoted by V~ x ). Then, the values of and are 

uniquely determined, as well as the values for F^_ 4 [ 1 ] , F(+ 4 [ 1 ] . 

Finally, when we additionally consider the n /2 — c inactive bits of i ^_ 4 marked 
by a star in Figure [7| being fixed, along with the already- fixed c bits of the active 
word 1 , the full n/2-bit values of F/^ 4 and F[/_ 4 are determined. In summary, for 
each value of the five c-bit active differences circled in Figure [7] and the n /2 — c 
inactive bits of i^j_ 4 , all the differences of the differential as well as one word 
values in rounds i + 2, i + 6 , and all state values in rounds i + 3, i + 4, i + 5 are 
uniquely fixed. 

For each of 5c + n/2 — c = n/2 + 4c word parameters, we can partially evaluate 
a 6-5-set Vi up to Av' i+7 [ 1 ]. Namely, for one member of the pairs, vi[ 1 ] is modified 
so that Avi[ 1 ] becomes Sj. The modification changes the difference in subsequent 
rounds, but we can still compute the corresponding difference Av[ +7 [ 1 ] without 
requiring the knowledge of the subkey bits. 

Indeed, in round i + 1, AF^_ ± = 0, Avi + 2 = AFf +2 = Sj. In round i + 2 , from 
the original active word value of F+2 and updated difference AF^_ 2 = Sj , the 
updated AF^ 2 can be computed as P o S(F^_ 2 ) ©Po S(Ff +2 0 Sj). This also 
derives the updated differences Avi + 3 and AF^_ S . Then, in round i + 3 to i + 5, 
from the original value and the updated difference of F 1 /', the updated difference 
AF ® , and moreover the updated differences Av x + 1 and AF^ +1 can be computed 
for x = i + 3,i + 4,z + 5. Note that, in round i 0 4, AF^_ 4 originally has only 
one active word, while the updated difference is fully active. Because n/2 — c 
inactive bits of F^_ 4 are parameters, and thus known to the attacker, AFf+ 4 can 
be computed in all words. Finally, in round i + 6 , the updated difference Avi + e 
is known in all words while the original value is known only in one active word. 
Since the position of the P-layer is moved, the attacker can still compute the 
1 -word updated difference Av' i+7 [ 1 ]. 
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To conclude, for each of the 2 n / 2+4c possible values of the parameters, the 
sequence of Av' i+ 1 [l] is uniquely obtained by computing Av' i+7 [ 1] for all Sj in 
Avi[ 1], which concludes the proof. □ 

10-Round Key-Recovery Attack. Let us describe the 10 -round key-recovery 
attack that uses the 7 -round distinguisher. As shown in Figure [8j we extend the 
7 -round differential characteristic of the distinguisher by one round at the begin- 
ning and two rounds at the end (the analysis and complexity would be similar if 
we extend by two rounds at the beginning and one at the end). Recall that the 
additional P-layer after v 7 , introduced by the distinguisher, has to be addressed 
in the key-recovery part. We also note that the active word 1 in the branches 
can be located in any position, but the position has to be fixed beforehand to 
be able to conduct the attack. The P-layer in round 8 is moved to two different 
positions as shown in Figure [U The newly- introduced P -1 transformation and 
the P transformation after v ' 7 generated by the distinguisher cancel each other, 
we therefore ignore them. Similarly to the analysis for Feistel-2, the attack con- 
sists of three parts: the precomputation phase, followed by the data collection 
and finally the meet-in-the-middle check to detect correct subkey candidates. 

Precomputation. Given the proof of Proposition [2j the precomputation phase 
is straightforward. For each of the 2 n / 2+4c values of the parameters, and for any 
value of Sj constructed at vo, the corresponding Av / 7 [ 1 ] can be computed easily 
as shown in Algorithm \ 5 \ As in the attack on Feistel-2, in this phase we construct 
the meet-in-the-middle table T§ that contains all the sequences of differences in 
Av' 7 [ 1 ] for 2 b < 2 C nonzero differences Sj in vq. The computational cost is about 
2 n / 2 +4c encryptions as the b parameter is relatively small and we consider only 
a small fraction of all the rounds. Storing T$ requires 2 c/n x 2 n/2+4c+6 blockg 0 £ 
n/2 bits, as the sequences contains 2 b elements of c bits. 

Collecting Pairs. To launch the attack, we need a pair that satisfies the 7 - 
round differential characteristic in Figure [3 i.e. the plaintext difference ( 1 ,P) 
should propagate to the ciphertext difference (P,A), where A is a truncated 
difference. The probability that the plaintext difference (1,P) after the first 
round becomes ( 0 , 1) is 2 -c , while the probability that the ciphertext differ- 
ence (P,A) after inversion of the last round becomes ( 1 , 1 ) is 2 -n / 2+c , and 
to become ( 1 , 0 ) after another inverse round is 2 -c . Therefore, a random pair 
verifying a plaintext difference ( 1 ,P) conforms to the inner 7 -round differen- 
tial with probability 2 -n / 2-c . Hence, we need to collect 2 n / 2+c pairs satisfying 

lOi? 

the differential ( 1 ,P) — » (P,A). Among all of them, one is expected to sat- 
isfy (Avi)Avo) = ( 0 , 1 ) and (Avg, Av 7 ) = ( 1 , 0 ). The procedure is given in 
Algorithm [6j 

For fixed values of the inactive bits in vq and ?;_i, about 2 4c pairs can be 
generated, and we expect approximately 2 4c • 2 -n / 2+c = 2 -n / 2+5c of them to 
verify the ciphertext truncated difference (P, A). By iterating the procedure for 
2 n-4c different values, we obtain 2 n-4c-n / 2+5c = 2 n / 2+c pairs satisfying the 
desired (Avq, Av- i) and (Av 9, Av 10). The data complexity required to generate 
the 2 n / 2+c pairs amounts to approximately 2 2c+n-4c = 2 n_2c chosen plaintexts, 
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Algorithm 5. Construction of the difference sequences of AaJ/l] 
(precomputation) 

1: for all 2 n//2+4c values of the parameters do 
2: Derive all differences of the differential. 

3: Derive 1-word state values in rounds 2 and 6. 

4: Derive all state values in rounds 3, 4 and 5. 

5: for 2 b different differences in vo do 

6: Modify Av o[l], and update the corresponding sequence of Av 4[1]. 

7: Insert the sequence of Z\r4[l] in the table T§. 


Algorithm 6. Data collection for the 10-round attack 

1: Fix the n/2 — c inactive bits of vo and v—\. 

2: for all 2 2c choices (yo,r>_i) do 
3: Query (^ 0 ,^- 1 ) to obtain (r>9, r>io)- 

4: Store (yg^io) i n a hash table indexed by the wanted inactive bits in P~ 1 (vq). 

5: Construct about 2 4c /2 n / 2— c = 2 ~ n / 2 + 5c pairs verifying the truncated ciphertext 
difference. 

6: Iterate the analysis 2 n_4c times by changing the the inactive-bit value of vq and vt . 


the computational cost is equivalent to 2 n 2c memory accesses, and the memory 
requirement is about 2 n / 2+c blocks of n/2 bits. 

Detecting Subkeys. For each of the 2 n / 2+c obtained pairs, we derive 2 C can- 
didates for n/2 + 2c bits of key material, namely Kg[l], K&[ 1], and Kg. For each 
pair, we first guess the 1-word difference of Av g[l]. Then, we assume the differ- 
ential characteristic is satisfied, i.e. Av 1 = 0 , Av' 7 = 0 , and Av g = 1. This fixes 
the input and output differences for the active words in rounds 0 and 8, and for 
all words in round 9. Then, the possible inputs for each of these S-Boxes can be 
reduced to a single value, and the corresponding subkeys Ao[l], As[1] and Kg 
can be calculated. 

Finally, we construct the 6-5-set by modifying ^o[l]. For each modified plain- 
text, with the knowledge of Ao[l], we modify v-\ such that v\ remains un- 
changed. From the corresponding ciphertexts, with the knowledge of Kg and 
Ag[l], we compute the sequence of 2 b differences Av' 7 [ 1], and if it matches one 
of the entries in the precomputed table T$, then the guessed subkeys Kg [1] , Kg [1] , 
and Kg are correct with high probability, otherwise they are wrong. When the 
values of c and n are in a particular range (see below), only the right guess will 
remain, thus the subkeys are recovered. 

The computational cost of the key-recovery phase is the one for computing 
Av' 7 [1] for 2 n / 2+c pairs, 2 C guesses for Av$[ 1], and 2 b choices of Sj in the 6-5-set, 
which is upper bounded by 2 n / 2+3c encryptions. 

Complexity Analysis and Constraints on (n, c). As shown above, the data 
complexity requires 2 n_2c chosen plaintexts, the time complexity is equivalent to 
2 ^— 2c _|_ 2 n / 2 + 5c encryptions and the memory complexity is 2 n / 2+5c blocks of n/2 
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bits. We note that the overall complexity is balanced when n/2c — 7, i.e. when a 
branch includes 7 S-Boxes. It is possible to achieve a simple tradeoff where only 
a fraction 1/2 C of all the sequences are stored in 7$, which decreases the memory 
complexity to 2 n / 2+4c blocks of n/2 bits, but in turn makes the data complexity 
and the time complexity of the online phase increased by a factor 2 C as we 
have decreased the chance to hit one element in T$. With this tradeoff, the data 
complexity becomes 2 n-c chosen plaintexts, and the time complexity becomes 
about 2 n-c + 2 n / 2+4c encryptions, which is balanced for n/2c = 5 S-Boxes per 
branch. 

Moreover, to launch the attack, a branch must have at least 5 S-Boxes so 
that n/2 + 4c < n. Additionally, in the subkey detection phase, the number 
of remaining key candidates should be one or small enough. The number of 
sequences in T$ is 2 n / 2+4c and the number of candidates derived online is 2 n / 2+2c . 
Thus in total, 2 n+6c matches are examined, whether or not we use the tradeoff. 
In theory, there exists 2 C ' 2 sequences from b < c bits to c bits. Hence, the 
condition to extract only the correct subkey is n + 6c — c • 2 6 < 0, which gives 
b > log 2 (6 + n/c). Since 2 b < 2 C , by combining the two conditions, the valid 
range for (n, c) is 10c <n< c(2 c — 6). For example, 128-bit block ciphers with 
8-bit S-Boxes and 80-bit block ciphers with 5-bit S-Boxes can be attacked. 

Another possible tradeoff is the one used to achieve the best attacks on reduced 
variants of the AES in [TO] . If we add a second active word at the beginning of the 
differential characteristic, it allows to reduce the data complexity, while keeping 
the same overall complexity. This tradeoff is possible as long as there are at least 
7 words per branch, i.e. n/2c> 7. The main advantage of adding an active word 
is to increase the size of the structures of plaintext from 2 2c to 2 4c , which allows 
to construct about 2 8c input pairs already verifying the input difference. The 
precomputation requires 2 n / 2+6c encryptions and a memory of 2 c/n x 2 n / 2+6c+6 
blocks of n/2 bits, the online phase requires more pairs, namely 2 n / 2+2c , but 
this is achieved with less data: only 2 n_3c chosen plaintexts. Therefore, the 
final time complexity is 2 n_3c + 2 n / 2+6c for both the encryption of the data 
and the precomputation. This yields an attack as long as n/2 + 6c < n, which 
is true for n/2c > 7 S-Boxes. For example, with 8 S-Boxes per branch, the 
attack without the second active word requires 2 14n / 16 chosen plaintexts, 2 14n / 16 
encryptions and the memory of about 2 12n / 16 blocks of n/2 bits, hence the overall 
complexity is 2 14n / 16 . For the same primitive, but with an additional active word, 
the tradeoff gives an attack that requires the same overall time complexity while 
the data complexity is reduced to 2 13n / 16 chosen plaintexts. 

5 Conclusion 

With the use of the meet-in-the-middle technique, we have shown the best known 
generic attacks on balanced Feistel ciphers. As we imposed very small restrictions 
on the round functions, our attacks are applicable to almost all balanced Feistels. 
Such ciphers, with an arbitrary round function and a double key are insecure on 
up to 10 rounds. In the case when the round function is SPN, for a large class of 
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linear P-layers, the attacks penetrate 14 rounds and recover all the subkeys. We 
have produced experimental verification of the attacks supporting our claims. 

Our results give insights on the lower bound on the number of rounds a secure 
Feistel should have. They suggest that this number in the case of SPN round 
functions should be surprisingly high. Furthermore, from the attacks on Feistel-2, 
we show that as long as the ratio of key to state size is increasing, the number 
of rounds that can be attacked will grow, while the data complexity will always 
stay below the full codebook. Thus, we have shown that a block cipher designer 
cannot fix a priori the number of rounds in a balanced Feistel and allow any ( or 
very large ) key size , as for each increment of the key by amount of bits equivalent 
to the state size, we can attack four more rounds. 

We have analyzed generic constructions and as such, we could not make any 
assumptions about the particular details of the ciphers, e.g. the key schedule, 
the permutation layer, etc. However, the attacks on the AES have shown that 
it is possible to take advantage of the cipher details in order to penetrate more 
rounds. Thus, we believe that our analysis can be used as a beginning step for 
attacks on larger number of rounds of specific Feistel ciphers. 
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Abstract. In FSE 2007, Ristenpart and Rogaway had described a 
generic method XLS to construct a length-preserving strong pseudoran- 
dom permutation (SPRP) over bit-strings of size at least n. It requires a 
length-preserving permutation 8 over all bits of size multiple of n and a 
blockcipher E with block size n. The SPRP security of XLS was proved 
from the SPRP assumptions of both 8 and E. In this paper we disprove 
the claim by demonstrating a SPRP distinguisher of XLS which makes 
only three queries and has distinguishing advantage about 1/2. XLS uses 
a multi- permutation linear function, called mix2. In this paper, we also 
show that if we replace mix2 by any invertible linear functions, the con- 
struction XLS still remains insecure. Thus the mode has inherit weakness. 


Keywords: XLS, SPRP, Distinguishing Advantage, length-preserving 
encryption. 


1 Introduction 

The notion of domain extension arises in many areas of cryptography such as 
hash function, pseudorandom function or PRF, strong pseudorandom permuta- 
tion or SPRP [12] etc. Usually, we design a building block defined for a small 
and fixed bit size domain. Then, by applying the building block iteratively, we 
obtain a similar kind of function defined over arbitrary domain. For example, 
a blockcipher defined on n bits can be used to define an encryption algorithm 
which can encrypt any message of size multiple of n. To define a ciphertext for 
a message whose size is not a multiple of n, one can first apply some padding 
rule to make the (padded) message of size multiple of n. This method can not 
preserve length as it expands ciphertext length. A length-preserving encryption 
is called an enciphering scheme. The length-preserving property makes our 
task more difficult and restricted than length expanding encryptions. On the 
other hand, designing enciphering schemes over all bit strings of size multiple 
of block-size (i.e., n) seems to be easier than defining over arbitrary bit strings. 
Many such enciphering schemes have been defined fims] . 

Non- Generic Methods. There are several known methods for turning a 
blockcipher into an enciphering schemes over arbitrary bit strings. One can ap- 
ply the underlying block cipher twice and use the intermediate output as an 
one-time pad for partial block (used in EME [7], TET [8], HEH [16] etc.); The 

P. Sarkar and T. Iwata (Eds.): ASIACRYPT 2014, PART I, LNCS 8873, pp. 478 44901 2014. 

(c) International Association for Cryptologic Research 2014 


XLS is Not a Strong Pseudorandom Permutation 


479 


other constructions e.g., HCTR [T7], HCH [3], XCB [13] use hash-then counter 
paradigm. A standard trick like ciphertext stealing can also be applied to specific 
constructions (e.g., AEZ HD- However, all these approaches are not generic. 

We call a method domain completion (or generic domain completion) if 
it converts any enciphering scheme over bit strings of size multiple of n into 
an enciphering scheme over any bit strings (possibly of size at least n). 

To our best knowledge, so far only two domain completions have been proposed. 

1. A popular, efficient and neatly defined domain completion method is XLS 
(extended by Latin Square) designed by Ristenpart and Rogaway [15]. The 
design rational of XLS is similar to that of elastic blockcipher as both follow 
encrypt-then-mix paradigm. 

2. Following hash-counter- hash paradigm, Nandi proposed a domain comple- 
tion method in m • 

In addition to these, a heuristically described method, called Elastic blockci- 
pher [4], was proposed by Cook, Yung and Keromytis. Later elastic blockcipher, 
defined over all bits of sizes in between n and 2 n, has been more formally defined 

in 0- 

Applications of Domain Completion Method. While primarily interested 
in the theoretical question of how to obtain domain extension for ciphers, 
arbitrary-input-length enciphering is a problem with many applications. A well- 
known application is disk-sector encryption in which size of ciphertext and plain- 
text remain same as the sector size of the disk. In general, enciphering scheme 
is easy to define for input sizes of multiple of n (block size of the underlying 
blockcipher). Domain completion methods can be used to define the enciphering 
schemes for arbitrary bit strings. It is also used in other symmetric key algo- 
rithms such as authenticated encryption. For example, XLS is widely adapted 
in many authenticated encryptions, e.g. AES-COPA [2], Deoxys, Joltik, KIASU, 
Marble, SHELL etc. p. 


1.1 Our Contribution 

In this paper we demonstrate a chosen plaintext-ciphertext adversary (CPC A) 
distinguisher against XLS (see Algorithm Ao in section 3.2). The attack makes 
only three encryption and decryption oracle queries in an interleaved manner 
and has distinguishing advantage about 1/2. Thus, the security claim of XLS is 
wrong. 

XLS uses a linear multi-permutation (very efficiently computable) mix2 which 
satisfies some property. Authors called any such linear permutation satisfying 
the property a good mixing function. It is natural to think a possible remedy of 
XLS to replace mix2 by other good mixing function or some other stronger lin- 
ear permutations. Unfortunately, we show that these remedies would not work. 
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To establish our claim, we consider a generalized version of XLS (we call it GXLS) 
which applies any arbitrary linear permutations instead of mix2. Moreover, we 
consider keys of two invocations of the underlying blockcipher E to be indepen- 
dently chosen. We demonstrate similar CPCA-distinguishers (in section 4) for 
GXLS having advantage at least 1/4. So we conclude that XLS has design haws 
in its modes not in the choice of linear mixing functions. 


2 XLS and Its General Form GXLS 


Basics and Notation 


1. An s-bit string X is denoted as X = X[1]X[2] • • -X[s\ where X[i] E {0, 1}. 
We denote X[i..j\ = X\i\ • • • X\j\ and \X\ = s. 

2. A length-preserving function / satisfies \f(X)\ = \X\ for ah X. 

3. Any linear function from {O^jMojO,!}* can be represented by a t x s binary 
matrix. Let rol(X) represent left circular bit-rotation, that is, for any bit- 
string X := X[l]X[2] • • -X[s] of length s, let rol(X) = X[2]X[3] • -X[s\X[ 1]. 
Note that rol is a linear invertible function and is represented by the following 
s x s invertible matrix: 


/0 1 0 •• 

• 00 \ 

00 1 •• 

• 00 

0 0 0 •• 

• 0 1 

\ 100 -- 

• 00 / 



Here, I s _i represents the identity matrix of size 5 — 1. 

4. Throughout the paper, let n be a fixed integer representing the block-size of 
the underlying blockcipher E. 


2.1 XLS and GXLS on {0, l} 2 ™" 1 

In this section we describe how XLS has been defined for bit strings of size 2n — 1. 
Later we show distinguishing attack of XLS by making queries of size 2n — 1 only. 
We refer readers to the original paper for the definition of XLS over arbitrary 
bit strings. We first define a linear function mix2 : {0, l} 2n_2 — {0, l} 2n_2 as 
below 

mix2(AB) = (A©rol(A©H), H©rol(A©H)) = ((R+I)-A+R-H, R-A+(R+I)-R) 

= /R + I R \ (A\ 

- V R R + I/’W 

where \A\ = \B\ = n — 1 and I is the identity matrix of size n — 1. It is easy to see 
that the inverse of the linear map mix2 is itself. Such a permutation is also called 
an involution. Now we describe the XLS algorithm m over the set of ah 2n — 1 
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Fig. 2.1. Illustration of (1) XLS, (2) GXLS and (3) 3-round Elastic Blockcipher. The 
XLS and 3-round Elastic blockcipher are special cases of GXLS. 


bit strings based on two n-bit (random) permutations E and £ and the linear 
permutation mix2. We would like to note that we express the input, output and 
intermediate variables with different notations from those of [15] which would 
be used to describe our attack and analysis. 


Algorithmic Definitions of XLS and GXLS. Now we describe the algorithms 
XLS and GXLS which are defined on 2n — 1 bits. 


Algorithm XLS 71 '’ 6 

Algorithm GXLS[77i, mixi, 772, mix 2 , n^\ 

Input: (P,Q) eFJx F” -1 

Input: (P,Q) eFJx F£ _1 

Output: ( C,D ) e Fj x F^" 1 . 

Output: ( C,D ) E F£ x FJJ -1 . 

01 E(P) = a\\A, a E F 2 . 

01 IIi(P) = A. 

02 u = a\, (U, W) = mix2(^4, Q). 

02 (U, W) = mixi(A,Q). 

03 £(u\\U) = v\\V. 

03 n 2 (u) = V. 

04 b = v\, (B,D) = mix2(V,W). 

04 (B, D) = mix 2 (V,W). 

05 E(b\\B) = C. 

05 n 3 (B) = C. 

06 return (C, D ). 

06 return (C, D). 


Here ! denotes bit complement. Here mixi and mix 2 are two invertible linear 
functions on 2n — 1 bits and mix2 is a linear invertible function over {0, l} 2n-2 
bits as described before. The TJ^’s are independent uniform random (or pseudo- 
random) permutations whereas in XLS £ and E are independent uniform ran- 
dom (or pseudorandom) permutations. We also denote the generalized-XLS as 
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GXLS[77i, mixi, 772, mix 2 , 77 3 ] (P, Q ) = (C, D ) as above (in the right hand side of 
Fig. l2.ip . Note that the XLS algorithm is nothing but GXLS[F, !||mix2,£, !||mix2, 
E\ where (!||/)(b, X) = b\\\f(X). In order for GXLS to be invertible, mixi and 
mix 2 should be invertible. 


2.2 Elastic Blockcipher 

The three round Elastic blockcipher can also be viewed as a GXLS[77i, mix3, 772, 
mix3, T7 3 ] where mix3(7L, 7>) = • • • A[i s ]) ® P, A[i\] • • • A[i s ]), \A\ = n, 

\B\ = s and 1 < i\ < • • • < i s < n are some fixed integers (specific choices of 
these values depend on the underlying blockcipher). We illustrate this method 
in Fig |2.1l when i± = n — s + 1, . . . , i s = n. Basic mix function of it can be defined 
as (X\\Y) \\X where \X\ = \Y\ = s. Similarly, four or higher number of 

rounds can be defined. So all of these follow the encrypt-mix paradigm iterated 
several rounds. We capture this paradigm for three iterations in GXLS. In the 
following sections, we prove that three rounds are not sufficient for having SPRP. 


3 Insecurity of XLS 

In this section we show that XLS is not SPRP (strong pseudorandom permu- 
tation). In fact we establish a distinguisher making only three oracle queries 
having distinguishing advantage about 1/2. Moreover, if we repeat this attack 
independently, we can amplify the distinguishing advantage close to one. We first 
briefly define basics of security notions related to distinguishing advantages. 


3.1 Security Definitions 

Let FL denote the uniform random function from {0, l} 2 to {0, l} 2 , i.e., the uni- 
form distribution on the set Func({0, l} 2 , {0, l} 2 ) of all functions from {0, l} 2 to 
itself. Given a set L C N := {1,2, • • •}, we denote R l for the tuple (R i)ieL of 
random functions where R^’s are jointly independently distributed. We call the 
set L length set. We call R^ a length-preserving uniform random function on 
{0, 1} L := U l { 0, l} 2 . Similarly, let denote the uniform random permutation 
on {0, l} 2 , i.e., the uniform distribution on the set Perm({0, l} 2 , {0, l} 2 ) of all 
permutations on {0, l} 2 . Note that the inverse random permutation, P^ 1 , is also 
an uniform random permutation. We similarly define length-preserving uniform 
random permutation P l on {0,1} L which is independent composition of P^ for 
all i G 7. 

Now let A be an oracle algorithm which has access of two length-preserving 
oracles 0\ and 0 2 . Suppose A makes queries from the set {0, 1} L for both oracles. 
We define SPRP- advantage of A for a length-preserving random permutation 
F l (not necessarily uniform) by 


AdvpP rp (M) = Pr^’^ 1 = 1] - Pr^’^ 1 = 1]. 
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In general, we can define advantage for two pairs of tuples of length-preserving 
random functions and (Gl,G^) as 

Adv^((F L ,F , L ),(G L ,G' i )) = Pr[A FL ’ F 'L = 1] - Pr[A° L ^ = 1], 

If A interacts with a length-preserving random permutation 0 \ and its inverse 
02 then we can assume the following: 

1. A is not making any repetition query. 

2. If Xi is Oi-query and yi is its response then there is no 02-query Xj = yi for 
some j > i and vice-versa. 

We can assume these since the responses are determined for these types of 
queries. An adversary satisfying the above conditions is called an allowed ad- 
versary. 

Theorem 1 . [ 1 1 ] Let R l and R' l be independently chosen length-preserving 

uniform random functions and let Pl be length-preserving uniform random per- 
mutation. Then for any allowed adversary A which makes at most Q queries, 
we have, 

Adv^((P L ,P L - 1 ),(R L ,R' L )) < 
where m = minjT : i G L}. 

The above result says that an uniform length-preserving random permutation 
is very close to an uniform length-preserving random function. Thus if we want 
to prove that an enciphering scheme is not SPRP-secure by small number of 
queries then it would be enough to compute the distinguishing advantage from 
uniform random functions for an allowed adversary. For example, when Q = 3, 
if for length-preserving construction F l, Adv^((P£, P^ -1 ), (F^, F^ 1 )) : = c is 
significant for an allowed adversary then Advp prp (M) is at least c — 2 _n+2 which 
is also significant. 

Remark 1 . The above is one side of the implication of the Theorem [0 The other 
application is to show a construction F l SPRP by showing Adv^((Fi / , F^ -1 ), 
(Rl,R' l )) is negligible. 

3.2 SPRP Distinguishing Algorithm 
Distinguishing Algorithm Ao for XLS. 

1. Make encryption query (Pi,Qi) and obtains response (C\,D\). 

2. Make decryption query (02 I«= C\,D^ := D i + 1) and obtains re- 
sponse (P2,Q2)- 

3. Make encryption query (P3 = P 2 , Q 3 ) and obtains response (0 3 , P 3 ) 
where 

Q3 = Qi + (I + R 2 ) • (Qi + Q2 + 1 )- 

4. return 1 if P 3 = Qi + Q3 + Pi, 0 otherwise. 
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Query- 1 (encryption) Query-2 (decryption) Query- 3 (encryption) 

Fig. 3.1. Ao makes three queries and obtains collisions on Ui and U3 values with 
probability 1/2 (due to the event that ai — 0,2) 


3.3 Analysis of Attack 

To see why our attack works, let us first observe some useful relations among 
internal variables in the computations of XLS. 

Lemma 1. With the notations as described in the algorithm XLS, we have A + 
5 = (R^ + I)-(Q + D). 

Proof. Since mix2 is inverse of itself we have (V,W) = mix2 (F>, D ). By equating 
W with line 02 of XLS algorithm, we have 

R • B + (R + I) • D = R • A + (R + I) • Q. 

Thus, R • (A + B) = (R + I) • (Q + D) and so the result follows. □ 

Lemma 2. With the notations as described in the algorithm XLS, we have U + 
V = R~ 1 -{Q + D). 

Proof. Due to line 02 and 04, we have R • J7 + (I + R) • VF = Q and R • V + 
(I + R) • W = D. Thus, R • (U + V) = (D + Q) and so the result follows. □ 

The basic idea of our attack is to obtain an internal collision. Suppose we 
have two queries ( Pi,Qi ) with responses ( Ci,D{ ), i — 1,2 such that the Ui 
values remain the same. So the outputs Vi are also same. Due to above lemma, 
we have Qi ® D\ = Q 2 ® D 2 . For a uniform random permutations this event 
can occur with a probability of about 2 -n+1 . Now we show that in query-1 

and query-3, V values collide with probability 1/2 and so we can distinguish 

XLS from uniform random permutation with advantage about 1/2 (for large n, 
2~n+i negligible). 
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Theorem 2. The Algorithm of Ao has distinguishing advantage about 1/2 
against XLS. 

Proof. Note that Ao makes three encryption and decryption queries in an in- 
terleaved manner. Let us denote the intermediate variables of computations of 
i th query by using suffix i, 1 < i < 3. Let us denote G = R -1 + 1. By Lemma [lj 
we have A\ + B\ = G • (Qi + D{) and A 2 + B 1 = A 2 + B 2 = G • (Q 2 + D 2 ) and 
so A\ + A% = Ai +■ A 2 = G • (Qi H- D\ + Q2 T D 2 ) = G • ( Qi + Q 2 + 1). Now we 
make our main claim: 

Claim: U x = U s . 

U x + U s = (R + I) • (Ai + As) + R • (Qi + Q 3 ) 

= (R + I) * G(Qi + Q 2 + 1 ) + R • (Qi + Q3) 

= (R + R 1 ) • (Qi + Q2 + 1 ) + R • (Qi + Q3) 

Since Q 3 = Q% + (I + R -2 ) • {Qi + Q2 + 1), we have U\ = C/ 3 . □ 

The rest of the proof is straightforward. As we have collision on U values, 
we have collision on V values, i.e., Vi = V3. But this can happen if the first 
bit of inputs of £ in query 1 and 3 match which can happen with probability 
1 / 2 . Assuming this, we can exploit the collision to make distinguishing attack 
as discussed before the theorem. We have D% = Q\ + Q3 + D\. This can hold for 
a uniform random permutation £ with probability about 2 -n+1 . So the result 
follows. □ 

Remark 2. The same attack works for any length of the form kn — 1 with same 

advantage. We only need the size of the partial block to be n — 1. Note that 

we need the first bit of output of E in query 1 and 3 should match which 
can happen with probability 1/2. For other length inputs, the distinguishing 
advantage reduces as we need more bits collision. In general, if we want to 
distinguish XLS only on kn + 5 bits inputs then we need collision on the first 
n — s bits of outputs of E in query 1 and 3 which can happen with probability 
about 2 s ~ n . So the distinguishing advantage would be about 2 s ~ n — 2 1 ~ n . So 
if the partial block size s is small the distinguishing advantage of our attack 
reduces. This is very natural as most of the intermediate bits are processed 
through £. 

4 Distinguishing Attack on GXLS on {0, i} 2n_1 

Now we demonstrate how we can modify the distinguishing attack for GXLS. 
This would suggest that any simple modification on XLS (such as changing mix 
functions with others) do not work. In other words, we show that the mode, 
not the mixing function, has inherent weakness. Behavior of this distinguishing 
attack depends on different cases of invertible mixing matrices mixi and mix 2 . 
As we need to assume these as linear permutations, we can represent these by 
the following (2n — 1 ) x (2n — 1 ) invertible matrices. 
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mixi = I 


^"[l]nxn 

V M [2](n-D> 


^[^]nx(n- 1) 
-^"[2] (n— 1) X (n— 1) 


mix 2 


( M'[ l] nxn JV , [l] nx(n - 1 ) A 

V^'t^n-ljxn N '\2\(n-l)x(n- 1) / ’ 


Before we demonstrate our attacks we state some notations and results which 
would be used. 

Notations. Given a r x s matrix A we denote C( A) the column space of the 
matrix. 


Lemma 3. Let M rxs and N rxt be two matrices and c rx i is a vector such that 

C(N) 2 C(M). For any two uniform random vectors a {0, 1} S and q <— {0, l} t 
(not necessarily independent) Pr[M • a = N • q + c] < 1/2. 

Proof. This is straightforward when M is of the form * ) where r is the rank 

of M and * means that the sub matrix could be anything. In this case there 
must exist i > r such that i th row of N is non-zero. As qi is uniform on {0, 1}, 
by equating the event on i th bit we get probability at most 1/2 to achieve the 
event. 

For a general matrix M, we can find two non-singular square matrices S and 
T such that S • M • T = * ) . So the given probability p should be same as 

Pi \SMT • (T~ 1 a) = SN-q + S'c\. 

Let us denote M' = SMT , a ' = T~ 1 a , c' = Sc and N' = 57V. With this 
notation, we have p = P r[M' - a' = N' • q + c']. Now note that M' has the form as 
considered before. Due to invertible property of 5 and T, we have the property 
that C(N') ^ C(M') and, a ' and q' follow individually uniform distributions. □ 

Now we describe our attacks for different cases of the sub matrices of the mix 
functions. Conventionally, we use suffix 1, 2 and 3 to denote intermediate values 
for the first, second and third queries respectively. 


4.1 rank(Af [2]) < n — 2 

In this case we first claim that the column space of N[ 2] must contain a vector 
which does not belong to the column space of M[ 2]. Otherwise, the rank of 
(n — 1) x (2 n — 1) matrix (M[ 2] N[2]) is less than n — 2 which contradicts that 
the matrix mixi is invertible. 

Now we run the algorithm Ao only for the first two queries. As before, we 
have M[2](Ai + A 2 ) = N[2\(Qi + Q 2 ) + N'[2] • 1. Using the lemma [3j we know 
that when the algorithm is interacting with uniform random permutation, the 
probability that N[2](Qi + Q 2 ) + N'[2] • 1 belongs to the column space of M[ 2] is 
less than 1/2. However, for the case of GXLS it occurs with probability one. So 
we can distinguish with advantage at least 1/2. We formally describe the attack 
algorithm by A\ below. 
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Distinguishing Algorithm A\. 

1. Make encryption query (Pi,Qi) and obtain response (Ci,Fh). 

2. Make decryption query (C 2 := Ci, P 2 := D i + l) and obtain response 

(p 2 ,Q 2 ). 

3. return 1 if N[2](Qi + Q 2 ) + N'[2\ • 1 G C(M[2]), 0 otherwise. 


Note that given a vector and a matrix M[ 2], there is an efficient algorithm 
to check whether a vector v belongs to the column space of M[ 2]. For this we 
essentially need to solve the system of equations M[2] • x = v and whenever we 
arrive contradiction a solution does not exist equivalently v is not a member of 
the column space. Alternatively we can first find some invertible matrices S and 
T (by some standard elementary operations) such that 

S ' M ( 2 1 T =(o 0 ) 

where r denotes the rank of M[ 2]. So M[ 2] • x = v if and only if 

(0 o ) <r ' x) = s ’ v 

which holds if and only if for all i > r, the i th entry of S • v is zero. 

Remark 3. Similar attack works when rank(M'[2]) < n — 2. In this case we only 
need to interchange the role of encryption and decryption queries. 


4.2 Case: rank(Af[2]) = rank(Af'[2]) — n — 1 , rank(iV[l]) < n — 2 

As N[ 1] does not have full rank, we can find Qi ^ Q 2 such that N\l\Qi = 
N[1]Q 2 . So U values collide for two encryption queries (P,Q 1 ) and (P, Q 2 ). Now 
we write the relationship among intermediate variables. So A\ = A 2 and due to 
choice of Qi and Q 2 we also have U\ = U 2 and hence V\ — V 2 . Now, let us write 
mix 2 function as 


mix 2 = 


( M"[ l] nx „ JV"[l] nX ( n _i) \ 

\M"[ 2]( n _i)xn N"[2\ ( n _:L)x(ra-l) / ’ 


By the applications of mixi and mix 2 functions for two queries, we have 


1. W\ = M[ 2] • Ai + N[ 2] • Q 1 , W 2 = M[ 2] • A 2 + N[2] • Q 2 and 

2. £>i = M" [2] • Vi + AT" [2] • Wi, £>2 = M"[2] • F 2 + AT"[2] • W 2 . 

SoW l +W 2 = N[2]-{Q 1 +Q 2 ) and /J, • D 2 m N"[2 ]-{Wi+W 2 ) = N"[2] ■ N[2] ■ 
(Q i + Q 2 ). Note that for a random function, we observe this with probability 
2~ n+1 . We formally describe the attack algorithm by A 2 below. 
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Distinguishing Algorithm M 2 . 

1. Let A[1 ]Qi=A[1]Q 2 . 

2. Make encryption query (Pi,Qi) and obtains response (Ci,Pi). 

3. Make encryption query (Pi,Q 2 ) and obtains response (C 2 ,P 2 ). 

4. return 1 if P 2 = Pi + A" [2] • 7V[2] • (Qi + Q 2 ), 0 otherwise. 


4.3 Case: rank(Xf[ 2 ]) = rank (AT' [ 2 ]) = n — 1, rank(iV[l]) = n — 1 

We make three queries same as Mo except the choice of Q 3 whose value is de- 
termined below. We have 

1. Pi + U 3 = M[ l](Ai + A 2 ) + N[1]Q! + TV[1]Q 3 . 

2 . M[ 2 ](Mi + A 2 ) = N[2](Qi + Q 2 ) + A' [ 2 ] (Pi + P 2 ) (from the computations 
of Wi and W 2 ). 

As the rank of M[ 2 ] is n — 1 and the right hand side of item 2 above is 
known, we can guess + A 2 ) correctly with probability 1/2 (since there are 
only two choices). So we can guess M[l](Ai + A 2 ) from M[2](Ai + A 2 ) with 
probability at least 1 / 2 . Let X be the guessed value of M[l](Ai + A 2 ). We now 
choose Q 3 such that U\ + P 3 = 0 (i.e., U\ — Us). From the item 1 of above, 
we define Q 3 = 7V[1 ] -1 X + Note that N[ 1] is assumed to be invertible in 
this case. So Pr[Pi = Us] >1/2. This essentially leads to a similar distinguisher 
as in XLS. However, we need to compute the distinguishing event similar to the 
computation of the previous case. By the applications of mixi, mix ^ -1 and mix 2 
functions for three queries, we have 

1 . Wi + W 2 = N'[2] • (Pi + D 2 ), 

2 . W 2 + W 3 = N[ 2] • (Q 2 + Q 3 ), and 

3. A" [2] • (Wi + Ws) = Pi + P 3 - 

So we have P3 = Pi + A" [ 2 ] • (iV 7 [ 2 ] • (Pi + P 2 ) + N[ 2 ] • (Qi + Q 3 )) which can 
be observed with probability 2 -n+1 for a random function. We formally describe 
the attack algorithm by A 2 below. 


Distinguishing Algorithm M 3 . 

1 . Make encryption query (Pi,Qi) and obtains response (Ci,Pi). 

2 . Make decryption query (C 2 := Ci,D 2 := D 1 + 1) and obtains response 
(P2M/2). 

3. Guess M[l](Mi + M 2 ), denoted X, from N[2](Qi + Q 2 ) + TV' [ 2 ] (Pi + P 2 ) 

4. Choose Q 3 such that TV[l](Qi + Qs) = X. 

5. Make encryption query (P 3 = P 2 ,Q 3 ) and obtains response (Cs,Ds). 

6 . return 1 if P 3 = P x + TV" [2] • (X'[ 2 ] • (Pi + P 2 ) + X[ 2 ] • (Qi + Q 3 )), 

7. return 0, otherwise. 
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5 Conclusion 

In this paper we provide chosen plaintext and ciphertext distinguishing attack 
(i.e., SPRP distinguisher) of XLS. It makes three encryption and decryption calls 
and has distinguishing advantage about 1/2. This attack can be further extended 
to a general form of XLS following mix-then-encrypt paradigm. We believe that 
it can not be repaired without introducing any non-linear functionality, e.g. an 
additional blockcipher call. So we need four blockcipher calls to make this types 
of design secure. Both Elastic blokcipher and Nandi’s construction make four 
calls of non-linear functions. However, Nandi’s construction could be potentially 
faster, as it requires two universal hash invocations (which can be achieved by 
applying four rounds of AES |6J) and one call of weak-PRF (optimistically, one 
can apply eight rounds of AES) in addition with a full blokcipher call (which is 
e.g., ten rounds of AES). So in total it requires 26 rounds of AES which is much 
faster than four full invocations of AES. 
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Abstract. Structure-preserving signatures are a quite recent but im- 
portant building block for many cryptographic protocols. In this paper, 
we introduce a new type of structure-preserving signatures, which allows 
to sign group element vectors and to consistently randomize signatures 
and messages without knowledge of any secret. More precisely, we con- 
sider messages to be (representatives of) equivalence classes on vectors 
of group elements (coming from a single prime order group), which are 
determined by the mutual ratios of the discrete logarithms of the repre- 
sentative’s vector components. By multiplying each component with the 
same scalar, a different representative of the same equivalence class is 
obtained. We propose a definition of such a signature scheme, a security 
model and give an efficient construction, which is secure in the SXDH 
setting, where EUF-CMA security holds against generic forgers in the 
generic group model and the so called class hiding property holds under 
the DDH assumption. 

As a second contribution, we use the proposed signature scheme to 
build an efficient multi-show attribute-based anonymous credential 
(ABC) system that allows to encode an arbitrary number of attributes. 
This is - to the best of our knowledge - the first ABC system that pro- 
vides constant-size credentials and constant-size showings. To allow an 
efficient construction in combination with the proposed signature scheme, 
we also introduce a new, efficient, randomizable polynomial commitment 
scheme. Aside from these two building blocks, the credential system re- 
quires a very short and constant-size proof of knowledge to provide fresh- 
ness in the showing protocol. 


1 Introduction 

Digital signatures are an important cryptographic primitive to provide a means 
for integrity protection, non-repudiation as well as authenticity of messages in 
a publicly verifiable way. In most signature schemes, the message space consists 
of integers in Z ord ( G ) f° r some group G or consists of arbitrary strings encoded 
either to integers in Z or d(G) or to elements of a group G using a suitable hash 
function. In the latter case, the hash function is usually required to be mod- 
eled as a random oracle (thus, one signs random group elements). In contrast, 
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structure-preserving signatures |SSlHl1 IKI4) can handle messages which are 
elements of two groups G\ and G 2 equipped with a bilinear map, without re- 
quiring any prior encoding. Basically, in a structure-preserving signature scheme 
the public key, the messages and the signatures consist only of group elements 
and the verification algorithm evaluates a signature by deciding group member- 
ship of elements in the signature and by evaluating pairing product equations. 
Such signature schemes typically allow to sign vectors of group elements (from 
one of the two groups G\ and G 2 , or mixed) and also support some types of 
randomization (inner, sequential, etc., cf. DSD- 

Randomization is one interesting feature of signatures, as a given signature 
can be randomized to another unlinkable version of the signature for the same 
message. Besides randomizable structure-preserving signatures, there are various 
other constructions of such signature schemes [24125118143] . We emphasize that 
although these schemes are randomizable, they are still secure digital signatures 
in the standard sense (EUF-CMA security). 

We are interested in constructions of structure-preserving signature schemes 
that do not only allow randomization of the signature, but also allow to random- 
ize the signed message in particular ways. Such signature schemes are particu- 
larly interesting for applications in privacy-enhancing cryptographic protocols. 


1.1 Contribution 

This paper has three contributions: A novel type of structure-preserving sig- 
natures defined on equivalence classes on group element vectors, a novel ran- 
domizable polynomial commitment scheme, which allows to open factors of the 
polynomial committed to, and a new construction (type) of multi-show attribute- 
based anonymous credentials (ABCs), which is instantiated from the first two 
contributions. 

Structure-Preserving Signature Scheme on Equivalence Classes. 

Inspired by randomizable signatures , we introduce a novel variant of structure- 
preserving signatures. Instead of signing particular message vectors as in other 
schemes, the scheme produces signatures on classes of an equivalence relation 
1Z defined on (G\Y with i > 1 (where we use G\ to denote G\ \ {Ocq}). More 
precisely, we consider messages to be (representatives of) equivalence classes on 
(G*Y, which are determined by the mutual ratios of the discrete logarithms of 
the representative’s vector components. By multiplying each component with the 
same scalar, a different representative of the same equivalence class is obtained. 
Initially, an equivalence class is signed by signing an arbitrary representative. 
Later, one can obtain a valid signature for every other representative of this 
class, without having access to the secret key. Furthermore, we require two rep- 
resentatives of the same class with corresponding signatures to be unlinkable, 
which we call class hiding. We present a definition of such a signature scheme 
along with game based notions of security and present an efficient construction, 
which produces short and constant-size signatures that are independent of the 
message vector length i. In the full version m, we prove the security of our 
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construction in the generic group model against generic forgers and the DDH 
assumption, respectively. 

Polynomial Commitments with Factor Openings. We propose a new, effi- 
cient, randomizable polynomial commitment scheme. It is computationally bind- 
ing, unconditionally hiding, allows to commit to monic, reducible polynomials 
and is represented by an element of a bilinear group. It allows to open factors of 
committed polynomials and re-randomization (i.e., multiplication with a scalar) 
does not change the polynomial committed to, but requires only a consistent 
randomization of the witnesses involved in the factor openings. We present a 
definition as well as a construction of such a polynomial commitment scheme. 
In the full version [37 , we give a security model in which we also prove the 
construction secure. 

A Multi-Show Attribute-Based Anonymous Credential (ABC) Sys- 
tem. We describe a new way to build multi-show ABCs (henceforth, we will only 
write ABCs) as an application of the first two contributions. From another per- 
spective, the signature scheme allows to consistently randomize a vector of group 
elements and its signature. So, it seems natural to use this property to achieve 
unlinkability during the showings of an ABC system. To enable a compact at- 
tribute representation, which is compatible with the randomization property of 
the signature scheme, we encode the attributes to polynomials and commit to 
them using the introduced polynomial commitment scheme. During the issuing, 
the obtainer is, then, given a set of attributes and the credential, which is a 
message (vector) consisting of the polynomial commitment and the generator 
of the group plus the corresponding signature. During a showing, a subset of 
the issued attributes can be shown by opening the corresponding factors of the 
committed polynomial. The unlinkability of showings is achieved through the in- 
herent re-randomization properties of the signature scheme and the polynomial 
commitment scheme, which are compatible to each other. Furthermore, to pro- 
vide freshness during a showing, we require a very small, constant-size proof of 
knowledge. We emphasize that our approach to construct ABCs is very different 
from existing approaches, as we use neither zero-knowledge proofs for prov- 
ing the possession of a signature nor for selectively disclosing attributes during 
showings. Recall that existing approaches rely on signature schemes that allow 
to sign vectors of attributes and use efficient zero-knowledge proofs to show pos- 
session of a signature and to prove relations about the signed attributes during 
a showing. 

Interestingly, in our construction the size of credentials as well as the size of 
the showings are independent of the number of attributes in the ABC system , 
i.e., a small, constant number of group elements. This is, to the best of our 
knowledge, the first ABC system with this feature. The proposed ABC system 
is secure in a security model adapted from [2318126127] , where we refer the reader 
to the full version m for the proofs and the security model. Finally, we compare 
our system to other existing multi- and one-show ABC approaches. Although 
we are only dealing with multi-show credentials, for the sake of completeness, we 
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also compare our approach to the one-show (i.e., linkable) anonymous credentials 
of Brands [20] (and, thus, also its provably secure generalization pL2jh 

1.2 Related Work 

In PE], Blazy et al. present signatures on randomizable ciphertexts (based on 
linear encryption El) using a variant of Waters’ signature scheme 03]. Basi- 
cally, anyone given a signature on a ciphertext can randomize the ciphertext 
and adapt the signature accordingly, while maintaining public verifiability and 
neither knowing the signing key nor the encrypted message. However, as these 
signatures only allow to randomize the ciphertexts and not the underlying plain- 
texts, this approach is not useful for our purposes. 

Another somewhat related approach is the proofless variant of the Chaum- 
Pedersen signature m which is used to build self-blindable certificates by Ver- 
heul in [42]. The resulting so called certificate as well as the initial message can 
be randomized using the same scalar, preserving the validity of the certificate. 
This approach works for the construction in m, but it does not represent a se- 
cure signature scheme (as also observed in 02) due to its homomorphic property 
and the possibility of efficient existential forgeries. 

Homomorphic signatures for network coding Qjj] allow to sign any subspace 
of a vector space by producing a signature for every basis vector with respect to 
the same (file) identifier. Consequently, the message space consists of identifiers 
and vectors. These signatures are homomorphic, meaning that given a sequence 
of scalar and signature pairs (/%, cq)f =1 for vectors Vi, one can publicly compute 
a signature for the vector v = Yl\= i Pi v i (this is called derive). If one was using 
a unique identifier per signed vector v, then such linearly homomorphic signa- 
tures would support a functionality similar to the one provided by our scheme, 
i.e., publicly compute signatures for vectors v' = /3v (although they are not 
structure-preserving). It is also known that various existing constructions, e.g., 
mm are strong context hiding , meaning that original and derived signatures 
are unlinkable. Nevertheless, this does not help in our context, which is due 
to the following argument: If we do not restrict every single signed vector to a 
unique identifier, the signature schemes are homomorphic, which is not compat- 
ible with our unforgeability goal. If we apply this restriction, however, then we 
are not able to achieve class hiding as all signatures can be linked to the initial 
signature by the unique identifier. We note that the same arguments also apply 
to structure-preserving linearly homomorphic signatures [40] . 

The aforementioned context hiding property is also of interest in more general 
classes of homomorphic (also called malleable) signature schemes (defined in [7. 
and refined in [9j). In [29] . the authors discuss malleable signatures that allow 
to derive a signature a' on a message m' = T(m) for an ’’allowable” transfor- 
mation T, when given a signature a for a message m. This can be considered 
as a generalization of signature schemes, such as quotable [10] or redactable sig- 
natures [38] with the additional property of being context hiding. The authors 
note that for messages being pseudonyms and transformations that transfer one 
pseudonym into another pseudonym, such malleable signatures can be used to 
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construct anonymous credential systems. They also demonstrate how to build 
delegatable anonymous credential systems mm- The general construction in 
[29] relies on malleable-ZKPs [ 25 ] and is not really efficient, even when instanti- 
ated with Groth-Sahai proofs [35] . Although it is conceptually totally different 
from our approach, we note that by viewing our scheme in a different way, our 
scheme fits into their definition of malleable signatures (such that their SigEval 
algorithm takes only a single message vector with corresponding signature and 
a single allowable transformation). However, firstly, our construction is far more 
efficient than their approach (and in particular really practical) and, secondly, 
[29] only focuses on transformations of single messages (pseudonyms) and does 
not consider multi-show attribute-based anonymous credentials at all (which is 
the main focus of our construction). 

Signatures providing randomization features 1*2412511 8| along with efficient 
proofs of knowledge of committed values can be used to generically construct 
ABC systems. The most prominent approaches based on 17-protocols are CL cre- 
dentials [24125] . With the advent of Groth-Sahai proofs, which allow 
(efficient) non-interactive proofs in the CRS model without random oracles, var- 
ious constructions of so called delegatable (hierarchical) anonymous credentials 
have been proposed H5E3- These provide per definition a non-interactive show- 
ing protocol, i.e., the show and verify algorithms do not interact when demon- 
strating the possession of a credential. In [34], Fuchsbauer presented the first 
delegatable anonymous credential system that also provides a non-interactive 
delegation protocol based on so called commuting signatures and verifiable en- 
cryption. We note that although such credential systems with non-interactive 
protocols extend the scope of applications of anonymous credentials, the most 
common use-case (i.e., authentication and authorization), essentially relies on 
interaction (to provide freshness/liveness). We emphasize that our goal is not to 
construct non-interactive anonymous credentials. Nevertheless, one could gener- 
ically convert our proposed system to a non-interactive one: in the ROM using 
Fiat- Shamir or by replacing our single 17-proof for freshness with a Groth-Sahai 
proof without random oracles, which is, however, out of scope of this paper. 

2 Preliminaries 

Definition 1 (Bilinear Map). Let Gi, G 2 and Gt be cyclic groups of prime 
order p, where G\ and G 2 are additive and Gt is multiplicative. Let P and P' 
generate G\ and G 2 , respectively. We call e : G\ x G 2 — >* Gt bilinear map or 
pairing if it is efficiently computable and the following conditions hold: 

Bilinearity: e(aP, bP f ) = e(P, P') ab = e(bP , aP') Va, be Z p 
Non-degeneracy: e(P, P') ^ 1 g t , he., e(P, P') generates Gt- 

If Gi = G 2 , then e is called symmetric (Type-1) and asymmetric (Type-2 or 
Type-3) otherwise. For Type-2 pairings there is an efficiently computable iso- 
morphism P : G 2 — >* Gi, whereas for Type-3 pairings no such efficient isomor- 
phism is assumed to exist. Note that Type-3 pairings are currently the optimal 
choice [30] . with respect to efficiency and security trade-off. 
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Definition 2 (Decisional DifRe Heilman Assumption (DDH)). Let p be 

a prime of bit length ft, G be a group of prime order p generated by P and 
let (P, aP,bP, cP) G G 4 , where a, b, c Gt Z*. Then, for every PPT adversary 
A distinguishing between (P, aP, 6P, abP) G G 4 and (P, aP, 6P, cP) G G 4 is 
infeasible, i.e. , there is a negligible function e(-) such that 

| Pr [true <— A(P, aP, 6P, abP)] — Pr [true G- A(P, aP, bP, cP)] | < e(ft). 

Definition 3 (Symmetric External DH Assumption (SXDH) [13]). Let 

Gi , G 2 and Gt be three distinct cyclic groups of prime order p and e : Gi x G 2 -G 
Gt be a pairing. Then, the SXDH assumption states that in both groups Gi 
and G 2 the DDH assumption holds. 

Note that the SXDH assumption formalizes Type-3 pairings, i.e., the absence of 
an efficiently computable isomorphism between Gi and G 2 as well as between 
G 2 and G\. 


Definition 4 (Bilinear Group Generator). Let BGGen be a PPT algo- 
rithm which takes a security parameter ft and generates a bilinear group BG = 
(p, Gi, G 2 , Gt, e, P, P') in the SXDH setting, where the common group order p 
of the groups Gi, G 2 and Gt is a prime of bitlength ft, e is a pairing and P as 
well as P' are generators of G\ and G 2 , respectively. 

Definition 5 (t-Strong DH Assumption (t-SDH) [1TJ ) . Let p be a prime 
of bitlength ft, G be a group of prime order p generated by P G G, a Gt Z* and 
let (a 1 P)\ = 0 G G t+1 for some t > 0. Then, for every PPT adversary A there is 
a negligible function e(-) such that 


Pr 


1 




a + c 


< e(«) 


for some cGZ p \ ^ a}. 


This assumption turns out to be very useful in bilinear groups (Type-1 or Type-2 
setting). However, in a Type-3 setting (SXDH assumption), where the groups 
G\ and G 2 are strictly separated, the presence of a pairing does not give any 
additional benefit. This is due to the fact that the problem instance is given 
either in G\ or in G 2 . As our constructions rely on the SXDH assumption, we 
introduce the following modified assumption, which can be seen as the natural 
counterpart for a Type-3 setting [30] : 

Definition 6 (co-t-Strong DH Assumption (co-t-SDH*)). Let G\ and G 2 

be two groups of prime order p (which has bitlength ft) generated by Pi G Gi and 
P 2 G G 2 , respectively. Let a Gt Z* and let (cPPi)* =0 £ G^ +1 and (cPP 2 )j =0 G 
G* +1 for some t > 0. Then, for every PPT adversary A there is a negligible 
function e(-) such that 


Pr 


(c, ——Pi) <- ^((c^POGo, (^>2)5=0) 


a 


< e(«) 


for some c G Z p \ {— a}. 
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Note that for a compact representation, we make a slight abuse of notation, 
where it should be interpreted as P\ = P and P 2 = P' . Obviously, we have co-t- 
SDH* < p t-SDH in group Gi. The t-SDH assumption was originally proven to be 
secure in the generic group model in H3 Theorem 5.1] and further studied in [32] . 
The proof is done in a Type-2 pairing setting, where an efficiently computable 
isomorphism P : G 2 G\ exists. In the proof, the adversary is given the 
problem instance in group G 2 and is allowed to obtain encodings of elements 
in G\ through isomorphism queries. As we are in a Type-3 setting, there is no 
such efficiently computable isomorphism. Thus, the problem instance given to 
the adversary must contain all corresponding elements in both groups G\ and G 2 • 
Then, the generic group model proof for the co-t-SDH* assumption can be done 
analogously to the proof in pa proof of Theorem 5.1]. The main difference is 
that instead of querying the isomorphism, the adversary must compute the same 
sequence of computations performed in one group in the other group, in order 
to obtain an element containing the same discrete logarithm, which, however, 
preserves the asymptotic number of queries. 

3 Structure-Preserving Signatures on Equivalence 
Classes 

We are looking for an efficient, randomizable structure-preserving signature 
scheme for vectors with arbitrary numbers of group elements that allows to 
randomize messages and signatures consistently in the public. It seems natural 
to consider such messages as representatives of certain equivalence classes and 
to perform randomization via a change of representatives. Before we can intro- 
duce such a signature scheme and give an efficient construction, we detail these 
equivalence classes. 

All elements of a vector (Mi)f =1 G (G*Y (for some prime order group Gi, 
where we write G\ for G\ \ {Ocq}) share different mutual ratios. These ra- 
tios depend on their discrete logarithms and are invariant under the operation 
7 : Z* x (G*Y ( G\Y with (s, (Mi)f =1 ) s(Mi)f =1 . Thus, we can use this 
invariance to partition the set (G*Y into classes using the following equivalence 
relation: 

K = {(M, N) G (G\Y x (G\Y : 3s G Z; such that N = s • M} C {G\) n . 

It is easy to verify that 7 Z is indeed an equivalence relation given that G\ has 
prime order. When signing an equivalence class [M]u with our scheme, one actu- 
ally signs an arbitrary representative (Mi)f =1 of class [M]^. The scheme, then, 
allows to choose different representatives and to update corresponding signatures 
in the public, i.e., without any secret key. Thereby, one of our goals is to guar- 
antee that two message-signature pairs on the same equivalence class cannot be 
linked. Note that such an approach only seems to work for structure-preserving 
signature schemes, where we have no direct access to scalars. Otherwise, if we 
wanted to sign vectors of elements of Z*, the direct access to the scalars would 
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allow us to decide class membership efficiently. This is also the reason, why we 
subsequently define the class hiding property with respect to a random-message 
instead of a chosen-message attack. 

3.1 Defining the Signature Scheme 

Now, we formally define a signature scheme for the above equivalence relation 
and its required security properties. 

Definition 7 (Structure-Preserving Signature Scheme for Equivalence 
Relation 1Z (SPS-EQ-7^)). An SPS-EQ-7^ scheme consists of the following 
polynomial time algorithms: 

BGGeri 7 ?,(ft): Is a probabilistic bilinear group generation algorithm, which on 
input a security parameter n outputs a bilinear group BG. 

KeyGen^(BG, £): Is a probabilistic algorithm, which on input a bilinear group 
BG and a vector length t > 1, outputs a key pair (sk, pk). 

Sign^(M, sk): Is a probabilistic algorithm, which on input a representative M 
of an equivalence class [M]n and a secret key sk, outputs a signature a for 
the equivalence class [M]n (using randomness y). 

ChgRep^(M, cr, p, pk): Is a probabilistic algorithm, which on input a representa- 
tive M of an equivalence class [M]^, the corresponding signature cr, a scalar 
p and a public key pk, returns an updated message-signature pair (M, a) 
(using randomness y). Here, M is the new representative p • M and a its 
updated signature. 

Verify^ (M, a, pk): Is a deterministic algorithm, which given a representative M, 
a signature a and a public key pk, outputs true if a is a valid signature for 
the equivalence class [M]^ under pk and false otherwise. 

When one does not care about which new representative is chosen, ChgRep^ 
can be seen as consistent randomization of a signature and its message using 
randomizer p without invalidating the signature on the equivalence class. The 
goal is that the signature resulting from ChgRep^ is indistinguishable from a 
newly issued signature for the new representative of the same class. 

For security, we require the usual correctness property for signature schemes, 
but instead of single messages we consider the respective equivalence class and 
the correctness of ChgRep^. More formally, we require: 

Definition 8 (Correctness). An SPS-EQ-7^ scheme is called correct , if for all 
security parameters kgN, for all i > 1, for all bilinear groups BG <— BGGen^(ft), 
all key pairs (sk, pk) <— KeyGen^(BG, i) and for all M G (G*) £ it holds that 

Verify^(ChgRep n (M, Sign n {M, sk), p, pk), pk) = true V p G Z*. 

Furthermore, we require a notion of EUF-CMA security. In contrast to the 
standard definition of EUF-CMA security, we consider a natural adaption, i.e., 
outputting a valid message-signature pair, corresponding to an unqueried equiv- 
alence class, is considered to be a forgery. 
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Definition 9 (EUF-CMA). An SPS-EQ-7^ scheme is called existentially un- 
forgeable under adaptively chosen-message attacks , if for all PPT algorithms A 
having access to a signing oracle 0( sk, M), there is a negligible function e(-) such 
that: 


Pr 


BG <— BGGen7^(^), (sk, pk) <— KeyGen^(BG, £) 

(M\a*) ^ pk) : 

\M*] n ± [M] n VM e Q A Verify TC (M*, ct*, pk) = true 


< e(«), 


where Q is the set of queries which A has issued to the signing oracle O. 

Subsequently, we let Q be a list for keeping track of queried messages M and 
make use of the following oracles: 

0 RM (£): A random- message oracle, which on input a message vector length 

picks a message M ^ (G*) e , appends M to Q and returns it. 

0 RoR (sk, pk, 6, M): A real-or-random oracle taking input a bit b and a message 

M. If M 0 Q, it returns A. On the first valid call, it chooses R (G*) e , 
computes M <— ((M, Sign-^(M, sk)), (R, S\gn n (R, sk))) and returns M[b\. 
Any next call for M' / M will return A and ChgRep^(A4[6], p, pk) otherwise, 

where p A Z*. 

Definition 10 (Class Hiding). An SPS-EQ-7^ scheme on (G*Y is called class 
hiding , if for every PPT adversary A with oracle access to O rm and 0 RoR , there 
is a negligible function e(-) such that 


Pr 


BG i — BGGen 7 ^(^), b i — {0, l}, (stste, sk, pk) i — ,A(BG, ^), 
O P- {O rm (£), 0 RoR ( sk, pk, 6, •)}, b * A° (state, sk, pk) : 

b* =b 




Here, the adversary is in the role of a signer, who issues signatures on random 
messages (in the sense of a random message attack) and can derive signatures for 
arbitrary representatives of queried classes. Observe that, if the adversary was 
able to pick messages on its own, e.g., knows the discrete logarithms of the group 
elements or puts identical group elements on different positions of the message 
vectors, it would trivially be able to distinguish the classes. Consequently, we 
define class hiding in a random message attack game and the random sampling 
of messages makes the probability of identical message elements at different 
positions negligible. 

Definition 11 (Security). An SPS-EQ-7^ scheme is secure , if it is correct, 
EUF-CMA secure and class hiding. 


3.2 Our Construction 

In our construction, we sign vectors of i > 1 elements of G \ , where the public key 
only consists of elements in G 2 and we require the SXDH assumption to hold. 
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The signature consists of four group elements, where three elements are from G\ 
and one element is from G2. Two signature elements (Zi, Z2) are aggregates of 
the message vector under i elements of the private key. In order to prevent an 
additive homomorphism on the signatures, we introduce a randomizer y G Z*, 
multiply one aggregate with it and introduce two additional values Y = yP and 
Y' = yP ' . The latter elements (besides eliminating the homomorphic property) 
prevent simple forgeries, where Y' contains an aggregation of the public keys 
X ', . . . , X[ in G2. This is achieved by verifying whether Y and Y' contain the 
same unknown discrete logarithms during verification. Our construction lets us 
switch to another representative M = pM of M by multiplying M and (Zi, Z2) 
with the respective scalar p. Furthermore, a consistent re-randomization of /0Z2, 
Y and Y' with a scalar y yields a signature a for M that is unlinkable to the 
signature a of M. In Scheme [lj we present the detailed construction of the 
SPS-EQ- 7 ^ scheme. 


BGGen7^(«): Given a security parameter k, output BG A- BGGen(/<c). 

KeyGen^ (BG, £): Given a bilinear group description BG and vector length i > 1, choose x P Z* 
and ( Xi) l i=1 JY (Z*)b set the secret key as sk (x, (xi)\ =1 ), compute the public key pk <(— 
{X ' , (X^)| =1 ) = (xP' , (xixP'Y i= i) and output (sk, pk). 

Sign-^ (M, sk): On input a representative M = (M^)- =1 G (G*)^ °f equivalence class [M]tz and 
secret key sk = ( x , (^i)^ =1 ), choose y Z* and compute 

£ £ 

Zr^x^ j XjMj, Z 2 ^ y ^2 x iMi and (Y, Y r ) «- y • (P, P'). 

i= 1 i=l 

Then, output a = (Z i, Z 2 ,Y,Y') as signature for the equivalence class [M]tz. 

ChgRep72,(M, a, p, pk): On input a representative M = (Mi)- =1 G (G*Y °f equivalence class [M]tz, 
the corresponding signature a = (Z i, Z 2 ,Y,Y'), P E Z* and public key pk, this algorithm picks 

y P- Z* and returns (M, a), where a A- (pZi, ypZ 2 , yY , yY') is the update of signature a for 
the new representative M <— p • (M.j)f =1 - 

Verify^ (M, a, pk): Given a representative M = (MiY=± G (G*Y °f equivalence class [M]tz, a 
signature a = (Z i, Z 2 , Y, Y') and public key pk = (X ' , (X^)^ =1 ), check whether 

£ 

]Q e(Mi,Xl) = e(Zi,P') A e(Zi , Y') = e(Z 2 , X') A e(P, Y') = e(Y, P') 

i= 1 

and if this holds output true and false otherwise. 


Scheme 1 . A Construction of an SPS-EQ-7?, Scheme 

Note that a signature resulting from ChgRep^ is indistinguishable from a new 
signature on the same class using the new representative (it can be viewed as 
issuing a signature with randomness y • y). 


3.3 Security of Our Construction 

In our construction, message vectors are elements of (G*)^, public keys are only 
available in G2 and signatures are elements of Gi and G2. Furthermore, we 
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rely on the SXDH assumption, and it seems very hard (to impossible) to ana- 
lyze the EUF-CMA security of the scheme via a reductionist proof using accepted 
non-interactive assumptions. Abe et al. [3] show that for optimally short 
structure-preserving signatures, i.e. , three-element signatures, such reductions 
using non-interactive assumptions cannot exist. But right now, it is not entirely 
clear how structure-preserving signatures for equivalence relation I Z fit into these 
results and if the lower bounds from [2] also apply. Independently of this, it ap- 
pears that a reduction to a (non-interactive) assumption is not possible, since 
due to the class hiding property the winning condition cannot be checked effi- 
ciently (without substantially weakening the unforgeability notion). Therefore, 
we chose to prove the EUF-CMA security of our construction using a direct proof 
in the generic group model such as for instance the proof of Abe et al. [2j Lemma 
1] (cf. 23 for the proof). 

Now, we state the security of the signature scheme. The corresponding proofs 
can be found in the full version [37] . 

Theorem 1. The SPS-EQ-IZ scheme in Schemed is correct. 

Theorem 2. In the generic group model for SXDH groups, Scheme U\ is an 
EUF-CMA secure SPS-EQ-IZ scheme. 

Theorem 3. If the DDH assumption holds in G\, Scheme [l\ is a class hiding 
SPS-EQ-IZ scheme. 

Taking everything together, we obtain the following corollary: 

Corollary 1. The SPS-EQ-IZ scheme in Scheme^ is secure. 


4 Polynomial Commitments with Factor Openings 

In [39], Kate et al. introduced the notion of constant-size polynomial commit- 
ments. The authors present two distinct commitment schemes, where one is 
computationally hiding ( PolyCom mitoi.) and the other one is unconditionally 
hiding ( PolyCom mitp ec i). These constructions are very generic, as they allow to 
construct witnesses for opening arbitrary evaluations of committed polynomials. 

Yet, we emphasize that in practical scenarios (and especially in our construc- 
tions) it is often sufficient to consider the roots of polynomials for encodings and 
to open factors of the polynomial instead of arbitrary evaluations. Moreover, we 
need a polynomial commitment scheme that is easily randomizable. Therefore, 
we introduce the subsequent commitment scheme for monic, reducible polyno- 
mials. Instead of opening evaluations, it allows to open factors of committed 
polynomials. Hence, we call this type of commitment polynomial commitment 
with factor openings. Our construction is unconditionally hiding, computation- 
ally binding and more efficient than the Pedersen polynomial commitment con- 
struction PolyCom mitped of [39]. Now, we briefly present this construction, which 
we denote by PolyCommitFO. 
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Setupp C (ft, t): It takes input a security parameter n E N and a maximum poly- 
nomial degree t E N. It runs BG E- BGGen(ft), picks a E- Z* and outputs 
sk E- a as well as pp E- (BG, {a l P)\ =1 ^ (a 1 P')\ =1 ) . 

Commitpc(pp, f{X)): It takes input the public parameters pp and a monic, re- 

ducible polynomial f(X) e7L v [X\ with deg / < t. It picks p E- Z*, computes 
the commitment C E- p • f(a)P E G\ and outputs (C, O ) with opening 
information O E- (p, /(X)).0j 

Open PC (pp, C, p, f(X)): It takes input the public parameters pp, a polynomial 
commitment C, the randomizer p used for C and the committed polynomial 
f(X) and outputs (p, /(X)). 

Verify PC (pp, C, p, f(X)): It takes input the public parameters pp, a polynomial 
commitment C, the randomizer p used for C and the committed polynomial 

? ? 

f(X). It verifies whether p ^ 0 A C = p • f(a)P holds and outputs true 
on success and false otherwise. 

FactorOpen PC (pp, C, /(X), g(X), p): It takes input the public parameters pp, a 
polynomial commitment C, the committed polynomial f(X), a factor g(X) 
of f(X) and the randomizer p used for C. It computes h(X) E- , the 
witness Ch p • h{a)P and outputs (g(X),Ch)- 
VerifyFactor PC (pp, C, g(X), Ch)> It takes input the public parameters pp, a poly- 
nomial commitment C to a polynomial /(X), a polynomial g(X) of positive 
degree and a corresponding witness Ch- It verifies that g(X) is a factor of 

? ? 

f(X) by checking whether Ch ^ O^i A e{Ch, g(cx)P f ) = e(C,P') holds. It 
outputs true on success and false otherwise. 

In analogy to the security notion in [39] . a polynomial commitment scheme 
with factor openings is secure if it is correct , polynomial binding , factor binding , 
factor sound , witness sound and hiding. The above scheme can be proven secure 
under the co-t-SDH^ assumption. For the security model and the formal proofs 
of security, we refer the reader to the full version mi- Note that one can also 
define a scheme based on the co-t-SDH^ assumption with C E G\ and Ch E G<i- 
Although this would improve the performance of Verify Factor PC , we define it 
differently to reduce the computational complexity of the prover in the ABC 
system in Section 15.31 Also note that we use the co-t-SDHJ assumption in a 
static way, as t is a system parameter and fixed a priori. Finally, observe that 
sk = a must remain unknown to the committer (and, thus, the setup has to be 
run by a TTP), since it is a trapdoor commitment scheme otherwise. 

5 Building an ABC System 

In this section, we present an application of the signature scheme and the poly- 
nomial commitment scheme introduced in the two previous sections, by using 

1 Subsequently, we use f{ot)P as short-hand notation for f% • ol % P even if a is 

unknown. 


Structure-Preserving Signatures on Equivalence Classes 503 


them as basic building blocks for an ABC system. ABC systems are usually con- 
structed in one of the following two ways. Firstly, they can be built from blind 
signatures: A user obtains a blind signature from some issuer on (commitments 
to) attributes and, then, shows the signature, provides the shown attributes and 
proves the knowledge of all unrevealed attributes [20112] . The drawback of such 
a blind signature approach is that such credentials can only be shown once in 
an unlinkable fashion {one- show). Secondly, anonymous credentials supporting 
an arbitrary number of unlinkable showings {multi- show) can be obtained in a 
similar vein using different types of signatures: A user obtains a signature on 
(commitments to) attributes, then randomizes the signature (such that the re- 
sulting signature is unlinkable to the issued one) and proves in zero-knowledge 
the possession of a signature and the correspondence of this signature with the 
shown attributes as well as the undisclosed attributes [24125] . Our approach also 
achieves multi-show ABCs, but differs from the latter significantly: We random- 
ize the signature and the message and, thus, do not require costly zero-knowledge 
proofs (which are, otherwise, at least linear in the number of shown/encoded at- 
tributes) for the showing of a credential. 

Subsequently, we start by discussing the model of ABCs. Then, we provide 
an intuition for our construction in Section HT2l and present the scheme in Sec- 
tion [5731 In Section [5T4l we discuss the security of the construction. Finally, we 
give a performance comparison with other existing approaches in Section [531 


5.1 Abstract Model of ABCs 

In an ABC system there are different organizations issuing credentials to different 
users. Users can then anonymously demonstrate possession of these credentials to 
verifiers. Such a system is called multi-show ABC system when transactions (is- 
suing and showings) carried out by the same user cannot be linked. A credential 
credi for user i is issued by an organization j for a set A = {(attr^, attrV / c )}^ =1 
of attribute labels attr^ and values attrVfc. By we mean the size of A, 
which is defined to be the sum of cardinalities of all second components attrV^ 
of the tuples in A. Moreover, we denote by A' C A a subset of the creden- 
tial’s attributes. In particular, for every fc, 1 < k < n, we have that either 
(attr*;, attrVfc) is missing or (attr^, attrV^) with attrV^ C attrVfc is present. 
A showing with respect to A' only proves that a valid credential for A' has been 
issued, but reveals nothing beyond (selective disclosure). 

We note that in some ABC system constructions, the entire key generation is 
executed by the Setup algorithm. However, we split these algorithms into three 
algorithms to make the presentation more flexible and convenient. 

Definition 12 (Attribute-Based Anonymous Credential System). An 

attribute-based anonymous credential (ABC) system consists of the following 
polynomial time algorithms: 

Setup: A probabilistic algorithm that gets a security parameter ft, an upper 
bound t for the size of attribute sets and returns the public parameters pp. 
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OrgKeyGen: A probabilistic algorithm that takes input the public parameters pp 
and j G N, produces and outputs a key pair (oskj, opkj) for organization j. 

UserKeyGen: A probabilistic algorithm that takes input the public parameters 
pp and i G N, produces and outputs a key pair (usk^, upkj for user i. 

(Obtain, Issue): These (probabilistic) algorithms are run by user i and organiza- 
tion j, who interact during execution. Obtain takes input the public param- 
eters pp, the user’s secret key usk^, an organization’s public key opkj and an 
attribute set A of size #A < t. Issue takes input the public parameters pp, 
the user’s public key upk i5 an organization’s secret key oskj and an attribute 
set A of size ■#- A < t. At the end of this protocol, Obtain outputs a credential 
cred i for A for user i. 

(Show, Verify): These (probabilistic) algorithms are run by user i and a verifier, 
who interact during execution. Show takes input public parameters pp, the 
user’s secret key usk^, the organization’s public key opkj, a credential cred* 
for set A of size #A < t and a second set A' □ A. Verify takes input pp, the 
public key opkj and a set A'. At the end of the protocol, Verify outputs true 
or false indicating whether the credential showing was accepted or not. 

An ABC system is called secure if it is correct , unforgeable and anonymous (for 

formal definitions, we refer the reader to the full version [37). 


5.2 Intuition of Our Construction 

Our construction of ABCs is based on the proposed signature scheme, on poly- 
nomial commitments with factor openings and on a single constant-size proof of 
knowledge (PoK) for guaranteeing freshness. In contrast to this, the number of 
proofs of knowledge in other ABC systems, like |23l20j and related approaches, is 
linear in the number of shown attributes. Nevertheless, aside from selective dis- 
closure of attributes, they allow to prove statements about non-revealed attribute 
values, such as AND, OR and NOT, interval proofs, as well as conjunctions and 
disjunctions of the aforementioned. The expressiveness that we achieve with our 
construction, can be compared to existing alternative constructions of ABCs 
[26127] . Namely, our construction supports selective disclosure as well as AND 
statements about attributes. Thereby, a user can either open some attributes 
and their corresponding values or solely prove that some attributes are encoded 
in the respective credential without revealing their concrete values. Furthermore, 
one may associate sets of values to attributes, such that one is not required to 
reveal the full attribute value, but only pre-defined ’’statements” about the at- 
tribute value such as {”01.01.1980”,” > 16”,” > 18”} for attribute birthdate. 
This allows us to emulate proving properties about attribute values and, thus, 
enhances the expressiveness of the system. 

Credential Representation: In our construction, a credential cred^ of user i 
is a vector of two group elements (Ci,P) together with a signature under the 
proposed signature scheme (see Section [372]) . During a showing, the credential 
gets randomized, which is easily achieved by changing the representative. The 
meaning of its values will be discussed subsequently. 
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Attribute Representation: We use PolyCommitFO (cf. Section [4]) to commit 
to a polynomial, which encodes a set of attributes A = {(attr^, attrV/ c )}^ =1 
(where the encoding is inspired from [36]). This commitment is represented by 
the credential value C\. 

Now, we show how we use polynomials to encode this set of attributes and 
values. Thereby, we use a collision-resistant hash function P:{0,1}*— )>Z* and 
the following encoding function to generate the polynomials: 

n 

enc : A n n (X-H(a.tti k \\M)). 

k= 1 ME attrVk 

This function is used to encode the set A in the issued credential, the shown 
attributes A' as well as its complement: 

A' = {(attr, attrV \ attrV 7 ) : (attr, attrV) G A, (attr, attrV / ) G A^U 
{(attr / , attrV) G A : (attr 7 , •) 0 A'} 

in every showing. The idea is that the credential includes a commitment to the 
encoding of A and that showings include a witness of the encoding of A' (with- 
out opening it) as well as A' in plain for which the encoding is then recomputed 
by the verifier. To compute these values, we use the PolyCommitFO public pa- 
rameters pp, which allow an evaluation of these polynomials in G\ and G 2 at 
a G Z* (without knowing the trapdoor a). Then, the verifier checks whether the 
multiplicative relationship enc (A) = enc (A') -enc(A') between the polynomials is 
satisfied by checking the multiplicative relationship between the corresponding 
commitments and witnesses via a pairing equation. More precisely, the commit- 
ment to the encoding of A is computed as C\ m r* • enc(A) (a)P with r* being 
the secret key of user i. We note that since no entity knows <a, we must compute 

t t 

Ci <— ri • enc(A) (a)P = r* • ^e^cdP, with enc(A) = G Z P [A]. 

i = 0 i = 0 

The verification of a credential, when showing A', requires checking whether the 
following holds: 


7 

VerifyFactor PC (pp, Ci, enc(A'), Cjt) = true, 

where Cjt = ri • enc(A') (a)P is part of the showing. A showing, then, sim- 
ply amounts to randomizing Ci, opening a product of factors of the commit- 
ted polynomial (representing the selective disclosure), providing a consistently 
randomized witness of the complementary polynomial and performing a small, 
constant-size PoK of the randomizer for freshness, as we will see soon. 

Example. For the reader’s convenience, we include an example of a set A. We 
are given a user with the following set of attributes and values: 


A = {(birthdate, {”01.01.1980” , ” > 18”}), (drivinglicense, { 7 ^, car})}. 
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Note that # indicates an attribute value that allows to prove the possession of 
the attribute without revealing any concrete value. A showing could, for instance, 
involve the following attributes A' and its hidden complement A': 

A' = {(drivinglicense, {#})} 

A' = {(birthdate, {”01.01.1980” , ” > 18” }), (drivinglicense, {car})}. 

Freshness. We have to guarantee that no valid showing transcript can be re- 
played by someone not in possession of the credential and the user’s secret key. To 
do so, we require the user to conduct a proof of knowledge PoKjy : C2 = 7 P} of 
the discrete logarithm of the second component C2 = pP of a credential, i.e., the 
value p, in the showing protocol. This guarantees that we have a fresh challenge 
for every showing. 

In order to prove the anonymity of the ABC system, we need a little trick. 
We modify the aforementioned PoK and require that the user delivers a proof 
of knowledge PoKjy : Q = 7 P V C2 = 7 P}, where Q is an additional value in 
the public parameters pp with unknown discrete logarithm q. Consequently, the 
user needs to conduct the second part of the proof honestly, while simulating the 
one for Q. In the proof of anonymity, this allows us to let the challenger know q 
and simulate showings without knowledge of the discrete logarithm of C2 , which 
is required for our reduction to work. Due to the nature of the OR proof, this 
cannot be detected by the adversary. 

5.3 The Construction of the ABC System 

Now, we present our ABC system in Scheme EJ where we use the notation X <e- 
f(X) to indicate that the value of X is overwritten by the result of the evaluation 
of f(X). Note that if a check does not yield true, the respective algorithm 
terminates with a failure and the algorithm Verify accepts only if VerifyFactorpc 
and Verify^ return true as well as PoK is valid. Also note that the first move 
in the showing protocol can be combined with the first move of the proof of 
knowledge. Therefore, the showing protocol consists of a total of three moves. 


5.4 Security 

In the full version we introduce a security model for attribute-based anony- 
mous credentials and we provide formal proofs for the following: 

Theorem 4. Schemed is correct. 

Theorem 5. If PolyCommitFO is factor-sound, H is a collision-resistant hash 
function, Scheme [7] is secure and the DLP is hard in G\, then Scheme [H is 
unforgeable. 

Theorem 6. If Schemed is class hiding, then Schemed is anonymous. 

Taking everything together, we obtain the following corollary: 
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Setup: Given («;,£), run pp' = (BG, (cPP)* =1 , (cPP')* = i) 4— Setup PC («;, t ) and let H : {0, 1}* — >■ Z* 

be a collision-resistant hash function used inside enc(-)- Finally, choose Q 4^- Gi and output 
pp e- (P, enc, Q, pp'). 

OrgKeyGen: Given pp and j G N, return (oskj, opk^) 4— KeyGen^ (BG, 2). 

UserKeyGen: Given pp and i 6 N, pick ri A- Z*, set 4— riP and return (usk^, upk^) 4— (r^, P^). 
(Obtain, Issue): Obtain and Issue interact in the following way: 

lssue(pp, upk^, oskj, A) Obtain (pp, usk^, opk ? -, A) 

e(Ci,P') = e(P i ,enc(A)(a)P / ) 4^- Ci <- n • enc(A)(a)P 

a Sign 72 .((Ci , P), oskj) Verify^ ( (Ci , P), cr, opk^) = true 

crech 4— ((Ci,P),cr) 


(Show, Verify): Show and Verify interact in the following way: 


Verify(pp, opk ? - , A') 


^VerifyFactor PC (pp / , C±, enc(A'), Cjj-) A ^ crec k ,C A / 
Verify^ (cred' , opk^J = true 


Show(pp, uskj, opk 7 -, (A, A'), cred^) 


cred' 4— ChgRep 7 ^(credi, p, opk^) 
Cjr^(p- usk^) • enc(A')(a)P 


^PoK{7:Q=7PvC 2 =7P}^ 


where cred^ = ((C i, C 2 ), cr). 


Scheme 2. A Multi-Show ABC System 


Corollary 2. Scheme [H is a secure ABC system. 

Note that in the proof of Theorem [5j we can distinguish whether a forgery 
goes back to a signature forgery of Scheme [T| or not. The reason for this is that 
the knowledge extractor of the PoK gives us the possibility to extract the used 
credential, which allows us to determine whether a showing is based on a queried 
credential (and, in further consequence, on a queried signature) or not. Hence, 
we are able to efficiently check the winning condition of the EUF-CMA game. 


5.5 Efficiency Analysis and Comparison 

We provide a brief comparison with other ABC approaches and for complete- 
ness also include the most popular one-show approach. As other candidates for 
multi-show ABCs, we take the Camenisch-Lysyanskaya schemes [23124125] as 
well as schemes from BBS + signatures [18111] which cover a broad class of ABC 
schemes from randomizable signature schemes with efficient proofs of knowledge. 
Furthermore, we take two alternative multi-show ABC constructions |26l27| as 
well as Brands’ approach [20] (also covering the provable secure version [T2] ) for 
the sake of completeness, although latter only provides one-show ABCs. We omit 
other approaches such as [8] that only allow a single attribute per credential. We 
also omit approaches that achieve more efficient showings for existing ABC sys- 
tems only in very special cases such as for attribute values that come from a 
very small set (and are, thus, hard to compare). For instance, the approach in 
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[22] for CL credentials in the strong RSA setting (encoding attributes as prime 
numbers) or in a pairing-based setting using BBS + credentials [41] (encoding 
attributes using accumulators), where the latter additionally requires very large 
public parameters (one BB signature [15. for every possible attribute value). 

Table [Dgives an overview of these systems. Thereby, Type-1 and Type-2 refer 
to bilinear group settings with Type-1 and Type-2 pairings, respectively. In a 
stronger sense, XDH as well as SXDH stand for bilinear group settings, where the 
former requires the external Diffie- Heilman assumption and the latter requires 
the SXDH assumption to hold. Furthermore, G q denotes a group of prime order q 
(e.g., a prime order subgroup of Z* or of an elliptic curve group). By |G|, we mean 
the bitlength of the representation of an element from group G and the value c 
is a constant specified to be approximately 510 bits in [26]. We emphasize that, 
in contrast to other approaches, such as [25127] . our construction only requires 
a small and constant number of pairing evaluations in all protocol steps. Note 
that in the issuing step we always assume a computation of O(L) for the user, as 
we assume that the user checks the validity of the obtained credential on issuing 
(most of the approaches, including ours, have cost 0(1) if this verification is 
omitted). 


Table 1 . Comparison of various approaches to ABC systems 




Parameter Size (L attributes) 

Issuing 

Showing ( k-of-L attributes) 



Setting 

PP 


Credential Size 

Issuer 

User 

Com 

Verifier 

User 

Com 

23 24] 

sRSA 

O(L) 

0(1) 

3|Z n | 

0(L ) 

O(L) 

O(L) 

O(L) 

O(L) 

0(L - k) 

[251 


Type-1 

0(L ) 

O(L) 

(2 L + 2) |Gi | 

O(L) 

O(L) 

O(L) 

0(L ) 

0(L ) 

0(L ) 

48 


Type-2 

O(L) 

0(1) 

|Gi| + 22|Zq| 

O(L) 

O(L) 

0(1) 

O(L) 

O(L) 

O(L) 

[261 


Type-2 

0(1) 

O(L) 

L|Gi| +c+ |0 2 | 

O(L) 

O(L) 

O(L) 

O(L) 

0(1) 

0(1) 



XDH 

0(L ) 

O(L) 

(2 L + 2)(|Gi j + \Z P \) 

O(L) 

O(L) 

O(L) 

O(k) 

0(k ) 

0(k) 

[tin 


G q 

O(L) 

0(1) 

2|G g | + 2|Zg | 

O(L) 

0(L ) 

0(1) 

0(k ) 

Oik ) 

0(L - k) 

Our 

SXDH 

0(L ) 

0(1) 

4|Gi| + |G 2 | 

O(L) 

O(L) 

0(1) 

O(fc) 

0(L - k) 

0(1) 


6 Future Work 

The proposed signature scheme seems to be powerful and there might be other 
applications that could benefit, like blind signatures or verifiably-encrypted sig- 
natures. We leave a detailed study and the analysis of such applications as future 
work. Future work also includes constructing revocable and delegatable anony- 
mous credentials from this new approach to ABCs. Furthermore, it is an interest- 
ing question whether the proposed construction is already optimal, whether such 
signatures can be built for other interesting relations and whether it is possible 
to construct such signature schemes whose unforgeability can be proven under 
possible non-interactive assumptions or even to show that this is impossible. 
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Abstract. The Schnorr signature scheme is the most efficient signature scheme 
based on the discrete logarithm problem and a long line of research investigates 
the existence of a tight security reduction for this scheme in the random oracle. 

Almost all recent works present lower tightness bounds and most recently Seurin 
(Eurocrypt 2012) showed that under certain assumptions the non - tight security 
proof for Schnorr signatures in the random oracle by Pointcheval and Stern (Eu- 
rocrypt 1996) is essentially optimal. All previous works in this direction rule out 
tight reductions from the (one-more) discrete logarithm problem. In this paper 
we introduce a new meta-reduction technique, which shows lower bounds for the 
large and very natural class of generic reductions. A generic reduction is inde- 
pendent of a particular representation of group elements and most reductions in 
state-of-the-art security proofs have this desirable property. Our approach shows 
unconditionally that there is no tight generic reduction from any natural com- 
putational problem 77 defined over algebraic groups (including even interactive 
problems) to breaking Schnorr signatures, unless solving 77 is easy. 

Keywords: Schnorr signatures, black-box reductions, generic reductions, alge- 
braic reductions, tightness. 

1 Introduction 

The security of a cryptosystem is nowadays usually confirmed by giving a security 
proof. Typically, such a proof describes a reduction from some (assumed-to-be-)hard 
computational problem to breaking a defined security property of the cryptosystem. A 
reduction is considered as tight , if the reduction solving the hard computational prob- 
lem has essentially the same running time and success probability as the attacker on 
the cryptosystem. Essentially, a tight reduction means that a successful attacker can be 
turned into an efficient algorithm for the hard computational problem without any sig- 


nificant increase in the running time and/or significant loss in the success probability^ 
The tightness of a reduction thus determines the strength of the security guarantees pro- 
vided by the security proof: a non-tight reduction gives weaker security guarantees than 
a tight one. Moreover, tightness of the reduction affects the efficiency of the cryptosys- 
tem when instantiated in practice: a tighter reduction allows to securely use smaller 
parameters (shorter moduli, a smaller group size, etc.). Therefore it is a very desirable 
property of a cryptosystem to have a tight security reduction. 

1 Usually even a polynomially-bounded increase/loss is considered as significant, if the polyno- 
mial may be large. An increase/loss by a small constant factor is not considered as significant. 
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In the domain of digital signatures tight reductions are known for many fundamen- 
tal schemes, like Rabin/Williams signatures (Bernstein, Eurocrypt 2008 ax many 
strong-RSA-based signatures (Schage, Eurocrypt 2011 ll25l k and RSA Full-Domain 
Hash (Kakvi and Kiltz, Eurocrypt 2012 [18]). The Schnorr signature scheme 1 26, ITT! 
is one of the most fundamental public-key cryptosystems. Pointcheval and Stern have 
shown that Schnorr signatures are provably secure, assuming the hardness of the dis- 
crete logarithm (DL) problem l22l . in the Random Oracle Model (ROM) 0. However, 
the reduction of Pointcheval and Stern from DL to breaking Schnorr signatures is not 
tight: it loses a factor of q in the time-to-success ratio, where q is the number of random 
oracle queries performed by the forger. 

A long line of research investigates the existence of tight security proofs for Schnorr 
signatures. At Asiacrypt 2005 Paillier and Vergnaud ED gave a first lower bound show- 
ing that any algebraic reduction (even in the ROM) converting a forger for Schnorr sig- 
natures into an algorithm solving some computational problem II must lose a factor 
of at least q 1 / 2 . Their result is quite strong, as they rule out reductions even for ad- 
versaries that do not have access to a signing oracle and receive as input the message 
for which they must forge (UF-NM, see ISection Al for a formal definition). However, 
their result also has some limitations: It holds only under the interactive one-more dis- 
crete logarithm assumption, they only consider algebraic reductions, and they only rule 
out tight reductions from the (one-more) discrete logarithm problem. At Crypto 2008 
Garg et al. E3 refined this result, by improving the bound from q 1 / 2 to q 2 / 3 with a new 
analysis and show that this bound is optimal if the meta-reduction follows a particular 
approach for simulating the forger. At Eurocrypt 2012 Seurin |[28l finally closed the 
gap between the security proof of ll22l and known impossibility results, by describing 
an elaborate simulation strategy for the forger and providing a new analysis. All previ- 
ous works EHQ51ES1 on the existence of tight security proofs for Schnorr signatures 
have the following in common: 

1 . They only rule out the existence of tight reductions from certain strong computa- 
tional problems, namely the (one-more) discrete logarithm problem DU . Reduction 
from weaker problems like, e.g., the computational or decisional Diffie-Hellman 
problem (CDH/DDH) are not considered. 

2. The impossibility results are themselves only valid under the very strong OMDL 
hardness assumption. 

3. They hold only with respect to a limited (but natural) class of reductions, so-called 
algebraic reductions. 

It is not unlikely that first the inexistence of a tight reduction from strong compu- 
tational problems is proven, and later a tight reduction from some weaker problem is 
found. A concrete recent example in the domain of digital signatures where this has 
happened is RSA Full-Domain Hash (RSA-FDH) (4). First, at Crypto 2000 Coron (8] 
described a non-tight reduction from solving the RS A-problem to breaking the security 
of RSA-FDH, and at Eurocrypt 2002 (9J showed that under certain conditions no tighter 
reduction from RSA can exist. Later, at Eurocrypt 2012, Kakvi and Kiltz DU gave a 
tight reduction from solving a weaker problem, the so-called Phi-Hiding problem. The 
leverage used by Kakvi and Kiltz to circumvent the aforementioned impossibility re- 
sults was to assume hardness of a weaker computational problem. As all previous works 


514 


N. Fleischhacker, T. Jager, and D. Schroder 


rule out only tight reductions from strong computational problems like DL and OMDL, 
this might happen again with Schnorr signatures and the following question was left 
open for 25 years: 

Does a tight security proof for Schnorr signatures based on any weaker com- 
putational problem exist? 

Our contribution. In this work we answer this question in the negative ruling out the 
existence of tight reductions in the random oracle model for virtually all natural compu- 
tational problems defined over abstract algebraic groups. Like previous works, we con- 
sider universal unforgeability under no-message attacks (UF-NM -security). Moreover, 
our results hold unconditionally. In contrast to previous works, we consider generic re- 
ductions instead of algebraic reductions, but we believe that this restriction is marginal: 
The motivation of considering only algebraic reductions from flU applies equally to 
generic reductions. In particular, to the best of our knowledge all known examples of 
algebraic reductions are generic. 

Our main technical contribution is a new approach for the simulation of a forger in 
a meta-reduction, i.e., “a reduction against the reduction”, which differs from previous 
works ED El Hi and which allows us to show the following main result: 

Theorem (Informal). For almost any natural computational problem 77, there is no 
tight generic reduction from solving 77 to breaking the universal unforgeability under 
no-message attacks of Schnorr signatures in the random oracle model. 

Technical approach. We begin with the hypothesis that there exists a tight generic re- 
duction 7 Z from some hard (and possibly interactive) problem 77 to the U F-N M -security 
of Schnorr signatures. Then we show that under this hypothesis there exists an efficient 
algorithm A4, a meta-reduction, which efficiently solves 77. This implies that the hy- 
pothesis is false. The meta-reduction M = M n runs 1Z as a subroutine, by efficiently 
simulating the forger A for 7 Z. 

All previous works in this direction eh ei mo followed essentially the same ap- 
proach. The difficulty with meta-reductions is that M = M n must efficiently simulate 
the forger A for 1Z. Previous works resolved this by using a discrete logarithm oracle 
provided by the OMDL assumption, which allows to efficiently compute valid signa- 
tures in the simulation of forger A. This is the reason why all previous results are only 
valid under the OMDL assumption, and were only able to rule out reductions from the 
discrete log or the OMDL problem. To overcome these limitations, a new simulation 
technique is necessary. 

We revisit the simulation strategy of A applied in known meta-reductions, and put 
forward a new technique for proving impossibility results. It turns out that considering 
generic reductions provides a new leverage to simulate a successful forger efficiently, 
essentially by suitably re-programming the group representation to compute valid sig- 
natures. The technical challenge is to prove that the reduction does not notice that the 
meta-reduction changes the group representation during the simulation, except for some 
negligible probability. We show how to prove this by adopting the “low polynomial de- 
gree” proof technique of Shoup f30t which originally was introduced to analyze the 
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complexity of certain algorithms for the discrete logarithm problem, to the setting con- 
sidered in this paper. 

This new approach turns out to be extremely powerful, as it allows to rule out re- 
ductions from any (even interactive) representation-invariant computational problem. 
Since almost all common hardness assumptions in algebraic groups (e.g., DL, CDH, 
DDH, OMDL, DLIN, etc.) are based on representation-invariant computational prob- 
lems, we are able to rule out tight generic reductions from virtually any natural compu- 
tational problem, without making any additional assumption. Even though we apply it 
specifically to Schnorr signatures, the overall approach is general. We expect that it is 
applicable to other cryptosystems as well. 

Generic reductions vs 1 , algebraic reductions. Similar to algebraic reductions, a generic 
reduction performs only group operations. The main difference is that the sequence of 
group operations performed by an algebraic reduction may (but, to our best knowledge, 
in all known examples does not) depend on a particular representation of group ele- 
ments. A generic reduction, however, is required to work essentially identical for any 
representation of group elements. Generic reductions are by definition more restrictive 
than algebraic ones, however, we explain below why we do not consider this restriction 
as very significant. 

An obvious question arising with our work is the relation between algebraic and 
generic reductions. Is a lower bound for generic reductions much less meaningful than 
a bound for algebraic reductions? We argue that the difference is not very significant. 
The restriction to algebraic reductions was motivated by the fact most reductions in 
known security proofs treat the group as a black-box, and thus are algebraic (21, 15 , 
'[28 j . However, the same motivation applies to generic reductions as well, with exactly 
the same arguments. In particular, virtually all examples of algebraic reductions in the 
literature are also generic. 

The vast majority of reductions in common security proofs for group-based cryp- 
tosystems treats the underlying group as a black-box (i.e., works for any representation 
of the group), and thus is generic. This is a very desirable feature, because then the 
cryptosystem can securely be instantiated with any group in which the underlying com- 
putational problem is hard. In contrast, representation- specific security proofs would 
require to re-prove security for any particular group representation the scheme is used 
with. Therefore considering generic reductions seems very reasonable. 

Generic reductions vs. security proofs in the generic group model. One might won- 
der whether our result is implied by previous works (in particular by 1281 ). since we are 
considering generic reductions, because for generic algorithms most non-trivial compu- 
tational problems in algebraic groups are equivalent to the discrete logarithm problem. 
The conclusion that therefore our result is implied by previous works is however not 
correct. 

Note that a reduction does not solve the computational problem alone. It has access to 
an attacker A. The algorithm which solves the computational problem is a composition 
7Z(A) of 1Z and A. If both 7 Z and A were generic algorithms, then the composition 
1Z{A) would also be a generic algorithm, and thus our results would indeed be trivial. 
But note that we do not require A to be generic. Therefore also the composition 1Z{A) 
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is not a generic algorithm, thus the generic equivalence of DLOG and other problems 
does not apply. SeelSection 2.4land|Figure 2|for further explanation. 


Further related work. Dodis et al. ca showed that it is impossible to reduce any com- 
putational problem to breaking the security of RSA-FDH in a model where the RSA- 
group Z* N is modeled as a generic group. This result extends [ TD . Coron (9) considered 
the existence of tight security reductions for RSA-FDH signatures U| . This result was 
generalized by Dodis and Reyzin flZl and later refined by Kiltz and Kakvi iFTSl . 

In the context of Schnorr signatures, Neven et al. f20) described necessary conditions 
the hash function must meet in order to provide existential unforgeability under chosen- 
message attacks (EU F-CM), and showed that these conditions are sufficient if the forger 
(not the reduction!) is modeled as a generic group algorithm. 

In fl3l Fischlin and Fleischhacker presented a result also about the security of 
Schnorr signatures which is orthogonal to our result. They show, again under the OMDL 
assumption, that a large class of reductions has to rely on re-programming the random 
oracle. Essentially they prove that in the non-pro grammable ROM El no reduction 
from the discrete logarithm problem can exist that invokes the adversary only ever on 
the same input. This class is limited, but encompasses all forking-lemma style reduc- 
tions used to prove Schnorr signatures secure in the programmable ROM. As said be- 
fore, the result is orthogonal to our main result, as it considers reductions in the non- 
programmable ROM. 


2 Preliminaries 

Notation. If S' is a set, we write 5 ^— $ S to denote the action of sampling a uniformly 
random element s from S. If A is a probabilistic algorithm, we denote with a <— $ A the 
action of computing a by running A. We denote with 0 the empty string, the empty set, 
as well as the empty list, the meaning will always be clear from the context. We write 
[n\ to denote the set of integers from 1 to n, i.e., [n\ := {i , ...,n}. 

2.1 Schnorr Signatures 

Let G be a group of order p with generator g , and let H : G x {0, 1} fc Z p be a 
hash function. The Schnorr signature scheme Il26ll27) consists of the following efficient 
algorithms (Gen, Sign, Vrfy). 

Gen(g): The key generation algorithm takes as input a generator g of G. It chooses 
x <— $ Z p , computes X := g x , and outputs (A, x). 

Sign(x, m): The input of the signing algorithm is a private key x and a message m G 
{0, l} k . It chooses a random integer r <— $ Z p , sets R := g r as well as c := 
H ( R , m), and computes y := x • c + r mod p. 

Vrfy (A, m, (i?, y)): The verification algorithm outputs the truth value of g y = A c • R , 
where c — H ( R , m) . 

Remark 1. Note that the above description of Schnorr signatures deviates slightly from 
the original description in | [26ll27lL where a signature consists of (c, y) instead of (i?, y), 
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which reduces the length of signatures significantly. However, note that it is possible 
to compute R from (c,y) as R := g y • X~ c . Similarly, it is possible to compute c 
from (R, m) as c := 77(77, m). Thus both representations are equivalent. In particular, 
changing between these two representation does not affect our results. 

2.2 Computational Problems 

Let G be a cyclic group of order p and g G G a generator of G. We write desc(G, g) 
to denote the list of group elements desc(G, g) = (g, g 2 , . . . , g p ) G G p . We say that 
desc(G, g) is the enumerating description of G with respect to g. 

Definition 1. A computational problem 77 in G is specified by three ( computationally 
unbounded ) procedures 77 = (Gn,Sn, V77), with the following syntax. 

£/7j(desc(G, g)) takes as input an enumerating description of G, and outputs a state st 
and a problem instance ( the challenge ) C = (C \, . . . , C u , C') G G w x {0, 1}*. We 
assume in the sequel that at least C\ is a generator of G. 

<S7i(desc(G, g), st , Q ) takes as input desc(G,g), a state st, and Q = (Qi, . . . , Q Vl 
Q') G G v x {0, 1}*, and outputs (st' , A) where st' is an updated state and A = 
(Tli, . . . , A v , A') G G*' x {0, 1}*. 

T/WdescfG, a), st, S, C) takes as input (desc(G, a), st, C ) as defined above, and S = 
(S ll ...,S w ,S f ) eG w x {0, 1}*. It outputs 0 or 1. 

If Sn always responds with A = 0 (i.e., the empty string), then we say that 77 is 
non-interactive. Otherwise it is interactive. The exact description and distribution of 
st,C,Q, A, S depends on the considered computational problem. 

Definition 2. An algorithm A (e, t)-solves the computational problem 77 if A has run- 
ning time at most t and wins the following interactive game against a ( computationally 
unbounded ) challenger C with probability at most e, where the game is defined as fol- 
lows: 

1. The challenger C generates an instance of the problem (st, C ) $ 07j(desc(G, g)) 

and sends C to A. 

2. A is allowed to issue an arbitrary number of oracle queries to C. To this end, A 

provides C with a query Q. C runs (st' , A) <Sij(desc(G, #)? Q), updates the 

state st := st' , and responds with A. 

3. Finally, algorithm A outputs a candidate solution S. The algorithm A wins the 
game (i.e., solves the computational problem correctly ) iff V/j(desc(G, g), st, C, 

S) = 1. 

Example 1. The discrete logarithm problem in G is specified by the following proce- 
dures. 077(desc(G, g)) outputs (st,C) with st = 0 and C = (g,h), where h $ G 
is a random group element. Sn (desc(G, g), st, Q ) always outputs (st' , A) = (st, 0). 
V7j(desc(G, g),st,C,S ) interprets S = S' G {0,1}* canonically as an integer in Z p , 
and outputs 1 iff h = g s . 

Example 2. We describe the u-one-more discrete logarithm problem (ix-OMDL) Em 
in G with the following algorithms. 0ij(desc(G, g)) outputs (st, C ) where C = (C\, 

. . . , C u ) <1- $ G w consists of u random group elements and st = 0. The algorithm 
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<S 7 j(desc(G, g), st, Q ) takes as input state st and group element Q G G. It responds 
with st' := st + 1 and A = A' G {0,1}*, where A! canonically interpreted as an 
integer in 7L V satisfies g A = Q. The verification algorithm V 7 j(desc(G, g), st , C, S ) 
interprets S = (S{, . . . , S^) G {0,1}* canonically as a vector of u integers in Z p , and 
outputs 1 iff st < u and for all i G [iz] . 

Example 3. The UF-NM -forgery problem for Schnorr signatures in G with hash func- 
tion H is specified by the following procedures. £/ 7 j(desc(G, g)) outputs (st, C ) with 
st = m and C = (g,X,m) G G 2 x {0, l} fe , where X = g x for x ^— $ 7L V and 
m v-$ {0, l} fe . 5ij(desc(G, g), st, Q) always outputs ( st',A ) = (st, 0). The verifi- 
cation algorithm V 7 j(desc(G, g), st, C, S) parses S as S = (ii, y) G G x Z p , sets 
c := iT(i?, st), and outputs 1 iff X c • = g y . 

2.3 Representation-Invariant Computational Problems 

In our impossibility results given below, we want to rule out the existence of a tight 
reduction from as large a class of computational problems as possible. Ideally, we want 
to rule out the existence of a tight reduction from any computational problem that meets 
Definition \T\ However, it is easy to see that this is not achievable in this generality: as 
Example [3] shows, the problem of forging Schnorr signatures itself is a problem that 
meets Definition Q] However, of course there exists a trivial tight reduction from the 
problem of forging Schnorr signatures to the problem of forging Schnorr signatures! 
Therefore we need to restrict the class of considered computational problems to exclude 
such trivial, artificial problems. 

We introduce the notion of representation-invariant computational problems. This 
class of problems captures virtually any reasonable computational problem defined over 
an abstract algebraic group, even interactive assumptions, except for a few extremely 
artificial problems. In particular, the problem of forging Schnorr signatures is not con- 
tained in this class (see Example [5] below). 

Intuitively, a computational problem is representation-invariant , if a valid solution 
to a given problem instance remains valid even if the representation of group elements 
in challenges, oracle queries, and solutions is converted to a different representation of 
the same group. More formal is the following definition: 

Definition 3. Let G, G be groups such that there exists an isomorphism : G —> 
G. We say that II is representation-invariant, if for all isomorphic groups G, G and 
for all generators g G G, all C m (C ±, . . . , C u , C') ^— $ £/ 7 j(desc(G, g)), all st = 
(st i, . . . , st u st') G G t x {0, 1}*, and all S = (Si, . . . , S w , S') e G w x {0, 1}* 
holds that Vij(desc(G, g), st, C, S) = 1 Vij(desc(G, g), st, C, S) = 1, where 

9 = <t>(g) G G ', C = (0(Ci), . . . ,(j)(C u ),C'), st = (cj)(sti ), . . . ,<j>(st t ),st'), and 
S=(<f>(S 1 ),...,<l>(S w ),S'). 

Observe that this definition only demands the existence of an isomorphism f : G — >■ G 
and not that it is efficiently computable. 

Example 4. The discrete logarithm problem is representation-invariant. Let C = (g, 
h) G G 2 be a discrete log challenge, with corresponding solution S' G {0, 1}* such 
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that S' canonically interpreted as an integer S' G 7L V satisfies g s = ft G G. Let 
f : G — X G be an isomorphism, and let (g, ft) := (</>(#), </>(ft)). Then it clearly holds 
that g^' = ft, where S' = S'. 

Virtually all common hardness assumptions in algebraic groups are based on re- 
presentation-invariant computational problems. Popular examples are, for instance, the 
discrete log problem (DL), computational Diffie-Hellman (CDH), decisional Diffie- 
Hellman (DDH), one-more discrete log (OMDL), decision linear (DLIN), and so on. 

Example 5. The UF-NM -forgery problem for Schnorr signatures with hash function 
H is not representation-invariant for any hash function H. Let C = (#, X, m) X— $ 
C /77 (d esc (G, g)) be a challenge with solution S = (R,y) G GxZ p satisfying X c • R = 
g y , where c := H(R, m). 

Let G be a group isomorphic to G, such that G D G = 0 (that is, there exists no 
element of G having the same representation as some element of G)0 Let G — x G 
denote the isomorphism. If there exists any R such that H (R, m) ^ H (<fr(R),m) in Z p 
(which holds in particular if H is collision resistant), then we have 

g V = X H{R,m) . R bm ^ g y ^ 

Thus, a solution to this problem is valid only with respect to a particular given repre- 
sentation of group elements. 

The UF-NM -forgery problem of Schnorr signatures is not representation-invariant, 
because a solution to this problem involves the hash value H(R 1 m ) that depends on 
a concrete representation of group element R. We consider such complexity assump- 
tions as rather unnatural, as they are usually very specific to certain constructions of 
cryptosystems. 

2.4 Generic Reductions 

In this section we recall the notion of generic groups , loosely following [|3Ql (cf. also fl9l 
24], for instance), and define generic (i.e., representation independent) reductions. 

Generic groups. Let (G, •) be a group of order p and E C {0, 1 } r io s ^1 be a set of 
size \E\ = |G|. If g,h G G are two group elements, then we write g -G h for g • hr 1 * 
Following da we define an encoding function as a random injective map f : G — X E. 
We say that an element e G E is the encoding assigned to group element ft G G, if 

4>(h) = e. 

A generic group algorithm is an algorithm 7 Z which takes as input C = (0(Ci), 

. . . , c/)(C u ), C'), where f{Ci) G E is an encoding of group element Ci for all i G [u\, 
and C' G {0, 1}* is a bit string. The algorithm outputs S = (</>(Si), . . . , </>(S w ), S'), 
where f(Si) G E is an encoding of group element Si for all i G [w], and S' G {0, 1}* 

2 Such a group G can trivially be obtained for any group G, for instance by modifying the 
encoding by prepending a suitable fixed string to each group element, and changing the group 
law accordingly. 
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PROC G(e,e', o) 

PROC GETlDX(e) 

PROC Encode(G) 

(e, e , o) G E x E X {•, -E} 

parse e = (ei, . . . , e w ) 

parse G = (Gi, . . . ,G U ) 

(i, j) := GETlDX(e, e) 

for j — 1, . . . , w do 

for j = 1 , . . . , u do 

return ENCODE(£f o C G ) 

pick first i G [|£^|] 

if s.t. Cf = Gj 


such that C E = ej 

|| 


ij ::= i 

else 


return (ii, ... ,i w ) 

e j <- $E\C E 

append ej to C E 
append Gj to C G 
return (ei, . . . , e u ) 


Fig. 1 . Procedures implementing the generic group oracle 


is a bit string. In order to perform computations on encoded group elements, algorithm 
1Z = 1Z° may query a generic group oracle (or “group oracle ” for short). This oracle 
O takes as input two encodings e = 0(G), e! = 0(G') and a symbol o G {•,-=-}, 
and returns 0(G off). Note that ( E , •£>), where • o denotes the group operation on E 
induced by oracle O , forms a group which is isomorphic to (G, •). 

It will later be helpful to have a specific implementation of O. We will therefore 
assume in the sequel that O internally maintains two lists £ g CG and C E C E. These 
lists define the encoding function 0 as Cf = 0(£f ), where Cf and Cf denote the i-th 
element of C G and C E , respectively, for alii G [|£ G |]. Note that from the perspective 
of a generic group algorithm it makes no difference whether the encoding function is 
fixed at the beginning or lazily evaluated whenever a new group element occurs. We 
will assume that the oracle uses lazy evaluation to simplify our discussion and avoid 
unnecessary steps for achieving polynomial runtime of our meta-reductions. 
Procedure Encode takes a list G = (Gi, . . . , G u ) of group elements as input. It 
checks for each Gj G L if an encoding has already been assigned to Gj , that is, if 
there exists an index i such that Cf = Gj. If this holds, Encode sets ej := Cf . 
Otherwise (if no encoding has been assigned to Gj so far), it chooses a fresh and 
random encoding Cj c-$ E \ C E . In either case Gj and ej are appended to C G and 
C E , respectively, which gradually defines the map 0 such that 0(Gj) = ej. Note 
also that the same group element and encoding may occur multiple times in the list. 
Finally, the procedure returns the list (ei, . . . , e u ) of encodings. 

Procedure GetIdx takes a list (ei, . . . , e w ) of encodings as input. For each j Gkc] 
it defines ij as the smallest index such that ej = C E , and returns (ii, . . . , i w )u 


3 Recall that the same encoding may occur multiple times in C E . 

4 Note that GetIdx may receive only encodings ei, ... ,e w which are already contained in C E , 
as otherwise the behavior of GetIdx is undefined. We will make sure that this is always the 


case. 
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The lists £ G , C E are initially empty. Then O calls (ei, . . . , e u ) ^— $ Encode(Gi, 
. . . , G u ) to determine encodings for all group elements Gi, . . . , G u and starts the 
generic group algorithm on input lZ{e \ , . . . , e u , G'). 

1Z° may now submit queries of the form (e, e', o) G E x E x {•,-=-} to the generic 
group oracle 0. In the sequel we will restrict 1Z to issue only queries (e, e', o) to O 
such that e, c! G It determines the smallest indices i and j with e = and e' = ej 
by calling (i, j) = GETlDx(e, e'). Then it computes £ G o C G and returns the encoding 
Encode(£ g o £ g ). Furthemore, we require that 1Z only outputs encodings 0(5*) such 
that0(Si) G C E . 

Remark 2. We note that the above restrictions are without loss of generality. To explain 
this, recall that the assignment between group elements and encodings is random. An 
alternative implementation O' of O could, given an encoding e 0 C E , assign a random 
group element G ^— $ G \ £ G to e by appending G to £ G and e to C E , in which case 
1Z would obtain an encoding of an independent, new group element. Of course 1Z can 
simulate this behavior easily when interacting with O, too. 


Generic reductions. Recall that a (fully black-box |23|) reduction from problem 77 to 
problem E is an efficient algorithm 1Z that solves 77, having black-box access to an 
algorithm A solving E. 

In the sequel we consider reductions 7 Z A, ° having black-box access to an algorithm 
A as well as to a generic group oracle O. A generic reduction receives as input a chal- 
lenge G = (0(Gi), . . . , 0(C.g), C') G G w x {0, 1}* consisting of u encoded group ele- 
ments and a bit-string C' . 7 Z may perform computations on encoded group elements, by 
invoking a generic group oracle O as described above, and interacts with algorithm A 
to compute a solution S = (0(S'i), . . . , (t>(S w ), S') G G™ x {0, 1}*, which again may 
consist of encoded group elements . . . , c/)(S w ) and a bit-string S' G {0, 1}*. Re- 

ductions from an interactive computational problem 77 may additionally have access to 
an oracle Sn corresponding to 77, we write JZ A, ° ,Sn . 

We stress that the adversary A does not necessarily have to be a generic algorithm. 
It may not be immediately obvious that a generic reduction can make use of a non- 
generic adversary, considering that A might expect a particular encoding of the group 
elements. However, this is indeed possible. In particular, most reductions in security 
proofs for cryptosystems that are based on algebraic groups (like (22, 6, 31 1, to name 
a few well-known examples) are independent of a particular group representation, and 
thus generic. 

Recall that TZ is fully blackbox, i.e., A is external to 1Z. Thus, the environment in 
which the reduction is run can easily translate between the two encodings. Consider as 
an example the reduction shown in Figure [2] that interacts with a non-generic adver- 
sary A. Our notion of generic reductions merely formalizes that the reduction works 
identically for any group representation. This is illustrated in Figure 2 with an “envi- 
ronment” converting group elements received and output by the reduction from one 
group representation to another. Note also that essentially all security reductions (from 
a computational problem in an algebraic group) in the literature are generic. We stress 
that we model only the reduction 1Z as a generic algorithm. We do not restrict the 
forger A in this way, as commonly done in security proofs in the generic group model. 
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Si,...,S w ,S' 



Fig. 2. An example of the interaction between a generic reduction 1Z and a non-generic adversay 
A against the unforgeability of Schnorr signatures. All group elements - such as the challenge 
input, random oracle queries, and the signature output by A - are encoded by the environment 
before being passed to 7 Z. In the other direction, encodings of group elements output by 7 Z- such 
as the public key that is the input of A, random oracle responses, and the solution output by 7Z - 
are decoded before being passed to the outside world. 


It may not be obvious that this is possible, because A expects as input group elements 
in some specific encoding, while 7 Z can only specify them in the form of random en- 
codings. However, the reduction only gets access to the adversary as a blackbox, which 
means that the adversary is external to the reduction, and the environment in which the 
reduction is run can easily translate between the encodings used by reduction and ad- 
versary. Further note, that while some reduction from a problem 77 may be generic, the 
actual algorithm solving said problem is not 7Z itself, but the composition of 7Z and A 
which may be non-generic. In particular, this means that any results about equivalence 
of interesting problems in the generic group model do not apply to the reduction. 


3 Unconditional Tightness Bound for Generic Reductions 

In this section, we investigate the possibility of finding a tight generic reduction 1Z that 
reduces a representation-invariant computational problem 77 to breaking the UF-NM- 
security of the Schnorr signature scheme. Our results in this direction are negative, 
showing that it is impossible to find a generic reduction from any representation-invari- 
ant computational problem. This includes even interactive problems. 

3.1 Single-Instance Reductions 

We begin with considering a very simple class of reduction that we call vanilla reduc- 
tions. A vanilla reduction is a reduction that runs the UF-NM forger A exactly once 
(without restarting or rewinding) in order to solve the problem 77. This allows us to 
explain and analyze the new simulation technique. Later we turn to reductions that may 
execute A repeatedly, like for instance the known security proof from l22l based on the 
Forking Lemma. 

An Inefficient Adversary A In this section we describe an inefficient adversary A that 
breaks the UF-NM -security of the Schnorr signature scheme. Recall that a black-box 
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reduction 7 Z must work for any attacker A. Thus, algorithm 7 Z A will solve the challenge 
problem 77, given black-box access to A. The meta-reduction will be able to simulate 
this attacker efficiently for any generic reduction 7 Z. We describe this attacker for com- 
prehensibility, in order to make our meta-reduction more accessible to the reader. 

1. The input of A is a Schnorr public-key X , a message m , and random coins uo E 
{0,1}". 

2. The forger A chooses q uniformly random group elements R \ , . . . , R q ^— $ G. (We 
make the assumption that q < |G|.) Subsequently, the forger A queries the random 
oracle 77 on (77^, m) for all i E [q\. Let q := 77 (77^, m) E be the corresponding 
answers. 

3. Finally, the forger A chooses an index uniformly at random a ^— $ [q], computes 
y E 7L V which satisfies the equation g y = X Ca • 77 a , and outputs ( R a ,y )• For 
concreteness, we assume this computation is performed by exhaustive search over 
all y E 7L V (recall that we consider an unbounded attacker here, we show later how 
to instantiate it efficiently). 

Note that (R a ,y) is a valid signature for message m with respect to the public key X. 
Thus, the forger A breaks the UF-NM -security of the Schnorr signatures with 
probability 1. 

Main Result for Vanilla Reductions Now we are ready to prove our main result for 
vanilla reductions. 

Theorem 1. Let 77 = (Gn,Sn-,Vn) be a representation-invariant (possibly interac- 
tive) computational problem with a challenge consisting of u group elements and let p 
be the group order. Suppose there exists a generic vanilla reduction 7 Z that (e^, tn)- 
solves 77, having one-time black-box access to an attacker A that (e^, t a) - breaks the 
UF-NM - security of Schnorr signatures with success probability = 1 by asking q 
random oracle queries. Then there exists an algorithm Ad that ( e,t)-solves 77 with 

2 («+g+t K ) 2 andt ~ tK ' 

Remark 3. Observe that Theorem Q] rules out reductions from nearly arbitrary compu- 
tational problems (even interactive). At a first glance this might look contradictory, for 
instance there always exists a trivial reduction from the problem of forging Schnorr 
signatures to solving the same problem. However, as explained in Example \5\ forging 
Schnorr-signatures is not a representation-invariant computational problem, therefore 
this is not a contradiction. 

Proof. Assume that there exists a generic vanilla reduction 1Z := 1 Z°^ s n^ A that (e^, 
tn ) -solves 77, when given access to a generic group oracle O , an oracle S ' n , and a 
forger A((j)(X), m, cc), where the inputs to the forger are chosen by 1Z. Furthermore, 
the reduction TZ simulates the random oracle TZ. 77 for A. We show how to build a meta- 
reduction M that has black-box access to TZ and to an oracle Sn and that solves the 
representation-invariant problem 77 directly. 

We describe M in a sequence of games, beginning with an inefficient implementation 
Ado of Xi and we modify it gradually until we obtain an efficient implementation Ad 2 
of Ad. We bound the probability with which any reduction TZ can distinguish each 
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implementation A ii from Mi-i for alii E {1,2}, which yields that M 2 is an efficient 
algorithm that can use 7 Z to solve 77 if 1Z in tight. 

In what follows let X{ denote the event that 7 Z outputs a valid solution to the given 
problem instance C of 77 in Game i. 

Game 0. Our meta-reduction Ata: = AdnjT is an algorithm for solving a representation- 
invariant computational problem77, as (Mined in Section [231 That is, M^ takes as in- 
put an instance C = (C \, . . . , C u , C') eG u x {0, 1}*, of the representation-invariant 
computational problem 77, has access to oracle Sn provided by 77, and outputs a can- 
didate solution S. 1Z is a generic reduction, i.e., a representation-independent algorithm 
for 77 having black-box access to an attacker A. Algorithm Adjojruns reduction 1Z as a 
subroutine, by simulating the generic group oracle O , the Sn oracle, and attacker A for 
7 Z. In order to provide the generic group oracle for 7 Z, M^ implements the following 
procedures (cf. Figure O . 


PRQC Ajol (C) 

# Initialization 

parse C = (Ci , . . . ,C U , C f ) 

C G := 0 ; C E := 0 
R — (77i , . . . , R q ) G g 
X:= 

Encode(X) 

C:= 

5 7i e, ’- 4 (C') 

# Finalization 

parse S' := (Si, . . . , Su,, S') 

:= GetIdx(5i, . . . , S w ) 
return S') 


proc A(4>(X),m, uS) 
for all i E [g] 

Q = lZ.'H{(j){Rd),m) 
a <-$ [<?] 

y ■■= log g x Ca R a 

return (R a , y). 

PROC Sn'(Q) 

parse Q = (ei, . . . , e v , Q') 

(ii, ■ ■ ■ ,i v ) = GETlDx(ei, ...,e v ) 

(Ai, . . . , A v , A') = Sn{C,i 1 , . . . , Ci u , Q ') 
(/i,...,/„) = Encode(j4i, . . . , Av ) 
return (/i, A). 


Fig. 3. Implementation of A"|q| 


Initialization of A^ At the beginning of the game, M^ initializes two lists 

C G := 0 and C E := 0, which are used to simulate the generic group oracle O. Fur- 
thermore, A1 |q] chooses R = (R \, . . . , R q ) G q at random (these values will later 
be used by the simulated attacker A ), sets X := (C\, . . . , C u , R±, . . . , R q ), and runs 
Encode(X) to assign encodings to these group elements. Then Aljojstarts the reduction 
7 Z on input C := (Cf , . . . , Cf, C'). Note that C is an encoded version of the challenge 
instance of 77 received by A4 qI That is, we have C = (<f>(Ci ), . . . , (f>(C u ), C ' ). Oracle 
queries of 7Z are answered by3/|Q|as follows: 
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Generic group oracle 0(e, e', o): To simulate the generic group oracle, .A/ljojim- 
plements procedures Encode and GetIdx as described in Section f2A\ Whenever 7 Z 
submits a query (e, e', o) G E x E x {•, -=-} to the generic group oracle O , the meta- 
reduction determines the smallest indices i and j such that e = e* and e' = by calling 
(i, j) = GETlDx(e, e'). Then it computes Cf o and returns ENCODE(£f o £^). 
ORACLE S' n (Q ): This procedure handles queries issued by 7Z to S' n by forward- 
ing them to oracle Sn provided by the challenger and returning the response. That 
is, whenever 7 Z submits a query Q = (ei,...,e v , Q') G E v x {0, 1}* to S ' n , the 
meta-reduction runs (ii , . . . , := GETlDx(ei , ... ,e v ) and queries <S/j to compute 
(Ai, . . . , A u , A') := Sn{Ei x , . . . , , Q'). Then 7VdjQ| determines the corresponding 

encodings as (/i, . . . , f u ) := Encode(Ai, . . . , A u ) and returns (/i, . . . , /*,, A') to 1Z. 
The FORGER A(<j>(X),m,uj): This procedure implements a simulation of the inef- 
ficient attacker A described in Section 13.11 It proceeds as follows. When 7 Z outputs 
(cj)(X),m, uj) to invoke an instance of A, A queries the random oracle TZ.'H provided 
by 7Z on (f(Ri), m) for all i G [q], to determine c* = R(f(Ri ) , m) . Afterwards, 
chooses an index a <— $ [q] uniformly at random, computes the the discrete logarithm 
y := log^ X Ccx R a by exhaustive search, and outputs (R a ,y). (This step is not efficient. 
We show in subsequent games how to implement this attacker efficiently.) 
Finalization of Eventually, the algorithm 7 Z outputs a solution S := (Si, . . . , 

S W ,S') G E w x{0, 1}*. The algorithm AljQjruns (ii,...,i w ) := GetIdx(S'i, . . . , S w ) 
to determine the indices of group elements (Cf , . . . , Cf w ) corresponding to encodings 
(Si, . . . ,S W ), and outputs (Cf, . . . ,Cf w ,S r ). 

Analysis of Note that Adjo] provides a perfect simulation of the oracles O and Sn 
and it also mimics the attacker from Section [3TTI perfectly. In particular, (R a ,y) is a 
valid forgery for message m and thus, 1Z outputs a solution S = (Si , . . . , S w , S') to 
C with probability Pr[Ajoj = fn- Since 77 is assumed to be representation-invariant, 
S := (Si, ...,S W ,S') with Si = <f>(Si) for i G [w] is therefore a valid solution to C. 
Thus, Aljo] outputs a valid solution S to C with probability e^. 

Game 1. In this game we introduce a meta-reduction A4n which essentially extends 
Aijo] with additional bookkeeping to record the sequence of group operations performed 
by 7v. The purpose of this intermediate game is to simplify our analysis of the final 
implementation Mq^ Meta-reduction proceeds identical to A1 |q} except for a few 
differences (cf. Figure 0]). 

Initialization of The initialization is exactly like before, except that 

maintains an additional list C v of elements of . Let Cf denote the i-th entry of 
C v . 

List C v is initialized with the u + q canonical unit vectors in Z^ + 7 That is, let r\i de- 
note the i-th canonical unit vector in Z^ +9 ,i.e., r\\ = (1, 0, . . . , 0), 772 = (0, 1, 0, . . . , 0), 
. . . , r] u +q = (0, . . . , 0, 1). Then C v is initialized such that Cf := ry for all i G [u + q\. 
Generic group oracle 0(e, e f , o): In parallel to computing the group operation, 
the generic group oracle implemented by A4 t I also performs computations on vectors 
of C v . 
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Given a query (e, e', o) G E x E x {•, the oracle O determines the smallest 
indices i and j such that e = and e' = by calling GetIdx. It computes a := 

o Cj G where o := + if o = • and o := — if o = -=-, and appends a to C v . 

Finally it returns Encode(£ g o C*f). 

Analysis of Recall that the initial content X of C G is X = (Ci, . . . , C U: Ri , . . . , 
R q ), and that 1Z performs only group operations on X. Thus, any group element h G jC g 
can be written as h = Yl ™ =1 ' Yli=i R°i U+% where the vector a = (ai, . . . , a u+g ) G 

is (essentially) determined by the sequence of queries issued by 1Z to O. For a vec- 
tor a G and a vector of group elements V = (ui, . . . , u n+g ) G G w+g let us write 
Eval(V, a) shorthand for Eva I (I/, a) := Y\a=i v T sec l ue h In particular, it holds 

that Eval(Z, a) = Yl ^ =1 C? • UU R ^ u+X . The key motivation for the changes intro- 
duced in Game [I] is that now (by construction of it holds that Cf = Eval(X, Cf ) 
for all z G [|£ G |] . Thus, at any point in time during the execution of 7Z, the entire list C G 
of group elements can be recomputed from C v and X by setting Cf := Eval(Z, Cj ) 
for i G [\C V |]. The reduction 7 Z is completely oblivious to this additional bookkeeping 
performed by Adjj], thus we have Pr[Ajjj = Pr[A|jj. 


proc A^\fC) 

# Initialization 


parseC = (Ci,..., C U ,C') 


C G := 0 ; C E := 0 ; 
R — (Ri , . . . , R q ) «-$ 


C v 0 
G q 


X:= (C 1 ,...,C u ,R 1 ,...,R q ) 
Encode(X) 


CY := r)i , Vz G [zz + <?]. 

C:= (/:?,..., £f,c') 

5 7e°’' 4 (C') 

# Finalization 


parse S' := (Si, . . . , S w , S') 

(zi , . . . , iw ) := GetIdx(Si, . . . , SZ) 
return (£ G , . . . , Cf w , S') 


PROC 0(e, e', o) 

(e, e , o) G F/ X E x {*, -j-} 
z := GETlDX(e) 
j := GETlDX(e') 

Ci I — O G 

append a to 

return ENCODE(£f o C G ) 


Fig. 4. Meta-Reduction Boxed elements show the differences to A"|q| All other procedures 
are identical to A"|q| and thus omitted. 


Game 2. Note that the meta-reductions described in previous games were not efficient, 
because the simulation of the attacker in procedure A needed to compute a discrete 
logarithm by exhaustive search. In this final game, we construct a meta-reduction 
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that simulates A efficiently. proceeds exactly like Aijjj except for the following 
(cf. Figured]). 

The forger *4(0(X ) , ra, cc) : When 7Z outputs (<f>(X ) , ra, uS) to invoke an instance of 
A , A queries the random oracle 7 Z.H provided by 7 Z on (0(770, ra) for all i G [q\, to 
determine q = 77(0(770, ra). Then it chooses an index a ^— $ [q\ uniformly at random, 
samples an element y uniformly at random from Z p , computes 77* := g y X~ Ca , and 
re-computes the entire list C G using 77* instead of R a . 

More precisely, let X* := (C ±, . . . , C u , 77i, . . . , 77 a _i, 77* , 77 a+ i, . . . , R q ). Observe 
that the vector X* is identical to the initial contents X of C G , with the difference that 
R a is replaced by 77* . The list C G is now recomputed from C v and X* by setting 
Cf := Eva I (X*, 72 f) for all i G [|£ y |]. Finally, returns (0(77*), y) to 1Z as the 
forgery. 

Analysis of First note that (0(77*),?/) is a valid signature, since 0(77*) is the 
encoding of group element 77* satisfying the verification equation g y = X Ca • 77* , 
where c a = 77(0(77* ), ra). Next we claim that 1Z is not able to distinguish Mq \ from 
TVdjjJ except for a negligibly small probability. To show this, observe that Game]2]and 
Game[l]are perfectly indistinguishable, if for all pairs of vectors C \ , Cj G C v it holds 
that Eva I (X, ) = Eva I (X, Cj ) ^> Eva I (X* , C\ ) - Eva I (X* , Cj ) , because in this 
case .A/ljj] chooses identical encodings for two group elements C G ,C G G C G if and only 
if chooses identical encodings. 


PROC *4(0(77), ra, uj) : 
O [q] 
for all i G [q] 

ci = 77.-77(0(770, m) 


y ^— $ ; 77* 


X* := (Ci, . . . , Cu, 77i, . . . , F a _i, 700 77 a +i, • • • , Rq) 

for j = 1, . . . , |£ g | do 



cf :=Eval(Z*,£D 


return (y,<f>{R* a )) 


Fig. 5. Efficient simulation of attacker *4 by 


Lemma 1. 7^7 F denote the event that 1Z computes vectors Cf , Cj G 7/ra7 

Eval(J,£f) = Eval(J,£j) A Eval(J*,£f) ^ Eva I (I*,£V) (1) 

or 

Eva I (I, Cj ) ± Eva I (I, Cj ) A Eval(J*,/:f ) Eval(J*,£j). 


( 2 ) 
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Then 

Pr[F] < 2(u + q + t n ) 2 /p ■ 

The proof of Lemma[l]is deferred to the full version. We apply it to finish the proof of 
Theorem Q] By Lemma [I] the algorithm fails to simulate A^with probability at 
most 2 (u + q + tiz) 2 /p- Thus, we have PrfAgj > Pr[Ajjj - 2 (u + q + t n ) 2 /p. 

Note also that Adjj] provides an efficient simulation of adversary A. The total running 
time of Aljj] is essentially of the running time of 7 Z plus some minor additional com- 
putations and bookkeeping. Furthermore, if 1Z is able to (e^, tn) solve 77, then A^is 
able to (e, t) -solve 77 with probability at least 


e > Pr[A^2jl > tn 


2(u + q + t n ) 2 
V 


Remark 4. Note that the simulated forger re-computes the entire list C G after replacing 
R a with 77* . This ensures consistency of the attacker’s view before and after replacing 
R a with 77* , if (and only if) it holds that 

Eva I (2, CY ) = Eva I (2, Cj ) <*=* Eva I (2*, £Y ) = Eva I (2*, Cj) (3) 

Lemma[l]bounds the probability that[3]does not hold, thus it bounds the probability that 
an attacker is able to notice the re-programming by receiving different results before and 
after the re-programming. 


4 Multi-instance Reductions 

Now we turn to considering multi-instance reductions, which may run multiple sequen- 
tial executions of the signature forger A. This is the interesting case, in particular be- 
cause the Forking-Lemma based security proof for Schnorr signatures by Pointcheval 
and Stern l22l is of this type. 

The meta-reduction described in detail in the full version is heavily based on Seurin’s 
meta-reduction |28J. Essentially, we show that our new simulation of forged signa- 
tures is compatible with Seurin’s approach for simulating a sequence of Random Or- 
acle queries. In combination this allows to prove that a generic reduction from any 
representation-invariant computational problem 77 to breaking Schnorr signatures loses 
a factor of at least q , which essentially matches the upper bound of (22). 

The description of the corresponding family of adversaries and the proof of the fol- 
lowing theorem can be found in the full version. 

Theorem 2. Let 77 he a representation-invariant computational problem. Suppose 
there exists a generic reduction (e^ An) -solves 77, having n-time 

black-box access to an attacker Afj that (e ^ ,7 A,q) -breaks the UF-NM - security of 
Schnorr signatures with success probability < 1 in time tjs, ~ q. Then there exists 
an algorithm M that (e, t) -solves TI with t « tn and 

e>e n ~ 2n(u + nq + t n )/p- nln ((1 - e^) _1 ) /q(l - p~ 1/4 ) 

This bound allows essentially the same analysis as in (28) and thus we arrive (for 
~ 1 — (1 — l/q) q ) at a lower bound for e of approximately en — Therefore, 7 Z 

must necessarily lose a factor of almost 1/q if the discrete logarithm problem is indeed 
hard. 
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5 A Note on Tightly-Secure Schnorr-Type Signatures 

There exist several variants of Schnorr signatures with tight security reductions from 
representation-invariant computational problems. This includes, for instance, the 
schemes of Goh and Jarecki E3 and Chevallier-Mames which are based on the 
computational Diffie-Hellman problem, and the scheme of Shao l29l . 

It is natural to ask why our tightness bound, in particular our technique of 
re-programming the group representation, can not be applied to these schemes. Due 
to space limitations, we have to defer this discussion to the full version of this paper. 

Acknowledgments. We thank the anonymous reviewers for valuable comments. Nils 
Fleischhacker and Dominique Schroder were supported by the German Federal Min- 
istry of Education and Research (BMBF) through funding for the Center for IT-Security, 
Privacy, and Accountability (CISPA; see www .cispa-security.org). Dominique 
Schroder is also supported by an Intel Early Career Faculty Honor Program Award. 


References 

1. Bellare, M., Namprempre, C., Pointcheval, D., Semanko, M.: The one-more-RSA-inversion 
problems and the security of Chaum’s blind signature scheme. Journal of Cryptology 16(3), 
185-215 (2003) 

2. Bellare, M., Palacio, A.: GQ and schnorr identification schemes: Proofs of security against 
impersonation under active and concurrent attacks. In: Yung, M. (ed.) CRYPTO 2002. LNCS, 
vol. 2442, p. 162. Springer, Heidelberg (2002) 

3. Bellare, M., Rogaway, P.: Random oracles are practical: A paradigm for designing efficient 
protocols. In: Ashby, V. (ed.) Conference on Computer and Communications Security ACM 
CCS 1993, Fairfax, Virginia, USA, November 3-5, pp. 62-73. ACM Press (1993) 

4. Bellare, M., Rogaway, P: The exact security of digital signatures - how to sign with RSA and 
rabin. In: Maurer, U.M. (ed.) EUROCRYPT 1996. LNCS, vol. 1070, pp. 399-416. Springer, 
Heidelberg (1996) 

5. Bernstein, D.J.: Proving tight security for rabin- williams signatures. In: Smart, N.P. (ed.) 
EUROCRYPT 2008. LNCS, vol. 4965, pp. 70-87. Springer, Heidelberg (2008) 

6. Boneh, D., Boyen, X.: Secure identity based encryption without random oracles. In: Franklin, 
M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 443-459. Springer, Heidelberg (2004) 

7. Chevallier-Mames, B.: An efficient CDH-based signature scheme with a tight security reduc- 
tion. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 511-526. Springer, Heidelberg 
(2005) 

8. Coron, J.-S.: On the exact security of full domain hash. In: Bellare, M. (ed.) CRYPTO 2000. 
LNCS, vol. 1880, pp. 229-235. Springer, Heidelberg (2000) 

9. Coron, J.-S.: Optimal security proofs for PSS and other signature schemes. In: Knudsen, L.R. 
(ed.) EUROCRYPT 2002. LNCS, vol. 2332, pp. 272-287. Springer, Heidelberg (2002) 

10. Dodis, Y., Haitner, I., Tentes, A.: On the instantiability of hash- and- sign RSA signatures. In: 
Cramer, R. (ed.) TCC 2012. LNCS, vol. 7194, pp. 112-132. Springer, Heidelberg (2012) 

11. Dodis, Y., Oliveira, R., Pietrzak, K.: On the generic insecurity of the full domain hash. In: 
Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 449-466. Springer, Heidelberg (2005) 

12. Dodis, Y., Reyzin, L.: On the power of claw-free permutations. In: Cimato, S., Galdi, C., 
Persiano, G. (eds.) SCN 2002. LNCS, vol. 2576, pp. 55-73. Springer, Heidelberg (2003) 


530 


N. Fleischhacker, T. Jager, and D. Schroder 


13. Fischlin, M., Fleischhacker, N.: Limitations of the meta-reduction technique: The case of 
schnorr signatures. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, 
vol. 7881, pp. 444-460. Springer, Heidelberg (2013) 

14. Fischlin, M., Lehmann, A., Ristenpart, T., Shrimpton, T., Stam, M., Tessaro, S.: Random 
oracles with(out) programmability. In: Abe, M. (ed.) ASIACRYPT 2010. LNCS, vol. 6477, 
pp. 303-320. Springer, Heidelberg (2010) 

15. Garg, S., Bhaskar, R., Lokam, S.V.: Improved bounds on security reductions for discrete 
log based signatures. In: Wagner, D. (ed.) CRYPTO 2008. LNCS, vol. 5157, pp. 93-107. 
Springer, Heidelberg (2008) 

16. Goh, E.J., Jarecki, S.: A signature scheme as secure as the Diffie-Hellman problem. In: 
Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 401-415. Springer, Heidelberg 
(2003) 

17. Goldwasser, S., Micali, S., Rivest, R.L.: A digital signature scheme secure against adaptive 
chosen-message attacks. SIAM Journal on Computing 17(2), 281-308 (1988) 

18. Kakvi, S.A., Kiltz, E.: Optimal security proofs for full domain hash, revisited. In: 
Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp. 537-553. 
Springer, Heidelberg (2012) 

19. Maurer, U.M.: Abstract models of computation in cryptography. In: Smart, N.P. (ed.) Cryp- 
tography and Coding 2005. LNCS, vol. 3796, pp. 1-12. Springer, Heidelberg (2005) 

20. Neven, G., Smart, N.P, Warinschi, B.: Hash function requirements for schnorr signatures. J. 
Mathematical Cryptology 3(1), 69-87 (2009) 

21 . Paillier, P, Vergnaud, D. : Discrete-log-based signatures may not be equivalent to discrete log. 
In: Roy, B. (ed.) ASIACRYPT 2005. LNCS, vol. 3788, pp. 1-20. Springer, Heidelberg (2005) 

22. Pointcheval, D., Stern, J.: Security proofs for signature schemes. In: Maurer, U.M. (ed.) 
EUROCRYPT 1996. LNCS, vol. 1070, pp. 387-398. Springer, Heidelberg (1996) 

23. Reingold, O., Trevisan, L., Vadhan, S.P.: Notions of reducibility between cryptographic prim- 
itives. In: Naor, M. (ed.) TCC 2004. LNCS, vol. 2951, pp. 1-20. Springer, Heidelberg (2004) 

24. Rupp, A., Leander, G., Bangerter, E., Dent, A.W., Sadeghi, A.-R.: Sufficient conditions for 
intractability over black-box groups: Generic lower bounds for generalized DL and DH prob- 
lems. In: Pieprzyk, J. (ed.) ASIACRYPT 2008. LNCS, vol. 5350, pp. 489-505. Springer, 
Heidelberg (2008) 

25. Schage, S.: Tight proofs for signature schemes without random oracles. In: Paterson, K.G. 
(ed.) EUROCRYPT 2011. LNCS, vol. 6632, pp. 189-206. Springer, Heidelberg (2011) 

26. Schnorr, C.-P: Efficient identification and signatures for smart cards. In: Brassard, G. (ed.) 
CRYPTO 1989. LNCS, vol. 435, pp. 239-252. Springer, Heidelberg (1990) 

27. Schnorr, C.P.: Efficient signature generation by smart cards. Journal of Cryptology 4(3), 
161-174 (1991) 

28. Seurin, Y.: On the exact security of schnorr- type signatures in the random oracle model. In: 
Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp. 554-571. 
Springer, Heidelberg (2012) 

29. Shao, Z.: A provably secure short signature scheme based on discrete logarithms. Inf. 
Sci. 177(23), 5432-5440 (2007) 

30. Shoup, V.: Lower bounds for discrete logarithms and related problems. In: Fumy, W. (ed.) 
EUROCRYPT 1997. LNCS, vol. 1233, pp. 256-266. Springer, Heidelberg (1997) 

31. Waters, B.: Efficient identity-based encryption without random oracles. In: Cramer, R. (ed.) 
EUROCRYPT 2005. LNCS, vol. 3494, pp. 114-127. Springer, Heidelberg (2005) 

A Universal Unforgeability under No-Message Attacks 

Consider the following security experiment involving a signature scheme (Gen, Sign, 
Vrfy), an attacker A, and a challenger C. 
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1. The challenger C computes a key-pair ( X , x) <— $ Gen(g) and chooses a message 
m ^— $ {0, l} k uniformly at random. It sends (X, m) to the adversary A. 

2. Eventually, A stops, outputting a signature a. 

Definition 4. We say that A (e, t)-breaks the UF-NM -security of (Gen, Sign, Vrfy), if 
A runs in time at most t and Pr [A(X, m) = a : Vrfy(X, m, a) = 1] > e. 

Note that UF-NM -security is a very weak security goal for digital signatures. Since we 
are going to prove a negative result, this is not a limitation, but makes our result only 
stronger. In fact, if we rule out reductions from some problem 77 to forging signatures 
in the sense of UF-NM, then the impossibility clearly holds for stronger security goals, 
like existential unforgeability under adaptive chosen-message attacks 1171 , too. 
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Abstract. We propose a new characterization of NP using square span 
programs (SSPs). We first characterize NP as affine map constraints 
on small vectors. We then relate this characterization to SSPs, which 
are similar but simpler than Quadratic Span Programs (QSPs) and 
Quadratic Arithmetic Programs (QAPs) since they use a single series 
of polynomials rather than 2 or 3. 

We use SSPs to construct succinct non- inter active zero-knowledge ar- 
guments of knowledge. For performance, our proof system is defined over 
Type III bilinear groups; proofs consist of just 4 group elements, verified 
in just 6 pairings. Concretely, using the Pinocchio libraries, we estimate 
that proofs will consist of 160 bytes verified in less than 6 ms. 
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1 Introduction 

Gennaro, Gentry, Par no and Raykova [GGPR13] proposed a new, influential char- 
acterization of the complexity class NP using Quadratic Span Programs (QSPs), a 
natural extension of span programs defined by Karchmer and Wigderson |KW93j . 
Their main motivation was the construction of Succinct Non-interactive Argu- 
ments of Knowledge (SNARKs) . Their work has lead to fast progress towards prac- 
tical verifiable computations, whereby a resource-constrained client offloads the 
computation of an expensive function to a computationally endowed server or 
cloud, but still intends to verify the correctness of any returned results. For in- 
stance, using Quadratic Arithmetic Programs (QAPs), a generalization of QSPs 
for arithmetic circuits, Pinocchio PHGR13] provides evidence that verified re- 
mote computation can be faster than local computation. At the same time, zero- 
knowledge variants of their construction enable the server to keep intermediate and 
additional values used in the computation private. Such constructions are at the 
forefront of privacy- friendly variants of Bitcoin, such as Pinocchio Coin |DFKP!3j 
and Zerocash [BSCG+14] . 
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We introduce Square Span Programs (SSPs), a radical simplification of quadr- 
atic span programs, and we use them to build simpler and more efficient SNARKs 
and Non-Inter active Zero-Knowledge arguments (NIZKs) for the verified compu- 
tation of binary circuits and the verification of SAT solving, two closely related 
problems. Thus, SSPs can be used to build NIZK arguments to support privacy 
properties while guaranteeing high integrity, at a minimal cost for the verifier. 

Square span programs are based on the insight that every 2-input binary gate 
g(a, b) = c can be specified using (1) an affine combination £ = aa + /3b + yc + S 
of the gate’s input and output wires that take exactly two values, £ = 0 or £ = 2, 
when the wires meet the gate’s logical specification; and (2), equivalently, as a 
single ‘square’ constraint (£ — l) 2 = 1. Composing such constraints, a satisfying 
assignment for any binary circuit (or any SAT problem) can be specified first 
as a set of affine map constraints, then as a constraint on the span of a set of 
polynomials, defining the square span program for this circuit. 

Due to their conceptual simplicity, SSPs offer several advantages over previous 
constructions for binary circuits. Their reduced number of constraints lead to 
smaller programs, and to lower sizes and degrees for the polynomials required 
to represent them, which in turn reduce the computation complexity required 
in proving or verifying NIZK arguments. Notably, their simpler ‘square’ form 
requires only a single polynomial to be evaluated for verification (instead of two 
for earlier QSPs, and three for Pinocchio |PHGR13| 1 leading to a simpler and 
more compact setup, smaller keys, and fewer operations required for proof and 
verification. 

The resulting, SSP-based SNARKs may be the most compact constructions 
to date. For performance, our proof system is defined over Type III bilinear 
groups; to this end, we revisit and restate known assumptions for Type III bilin- 
ear groups. The communicated proofs consist of just 4 group elements (3 in the 
left group, and one in the right group); they can be verified in just 6 pairings, 
plus one multiplication for each (non-zero) bit of input, irrespective of the size 
of the circuit. Concretely, using the same groups as in the implementation of 
Pinocchio, we arrive at 160-byte proofs that we estimate can be verified in less 
than 6 ms, for circuits with millions of gates. For instance, our SNARKs would 
be entirely adequate to verify the solutions of large SAT problems offloaded to 
specialized servers and tools, such as those available in the annual SAT competi- 
tiorQ without the need to communicate (or even reveal) their complete solutions. 

2 Square Span Programs 

In this section we will provide new characterizations of languages in NP. First, 
we show that circuit satisfiability can be recast as a set of constraints on affine 
maps over the integers. Next, we show in Section lT2l that this leads to the NP- 
completeness of square span programs as defined below. The reader may find 
the example in Section 12.31 useful to illustrate the transformation from circuit 


1 http://satcompetition.org/ 
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satisfiability to square span programs. We compare square span programs to 
quadratic span programs in Section [2^41 

Definition 1 (Square span program). A square span program Q over the 
field F consists of m + 1 polynomials vo(x),vi(x), . . . ,v m (x) and a target poly- 
nomial t(x) such that deg(^(x)) < deg (t(x)) for all i = 0 , . . . , m. 

We say that the square span program Q has size m and degree d = deg(£(x)). 
We say that Q accepts an input (ai, . . . ,afi) G F^ if and only if there exist 
ae+i, • • • , a m G F satisfying 



t(x) divides 


We say that Q verifies a boolean function f : {0,1}^ — >■ {0,1} if it accepts 
exactly those inputs a G F^ that satisfy a G {0, 1} £ and /(a) = 1. 

In the definition, we may see / as a binary circuit or, more abstractly, as a 
logical specification of a satisfiability problem. In our NIZK argument system 
in Section 13.31 we will split the £ inputs into i u public and £ w private inputs. 
We remark that the public ‘inputs’ are considered from the viewpoint of the 
verifier: for an outsourced computation for instance, they may include both the 
inputs sent by the clients and the outputs returned by the server performing the 
computation together with its proof; for a SAT problem, they may provide a 
partial instantiation of the problem, or a part of its solution. 

This treatment is strictly more general than classic Circuit- SAT which only 
cares about satisfiability and thus corresponds to the special case of £ u = 0, i.e., 
Q verifies a circuit C if it accepts exactly those w where C(w) = 1. Alternatively, 
if we want the same SSP Q to handle different circuits, it may be useful to let / 
be a universal circuit that takes as input an £ u -bit description of a freely chosen 
circuit C and an £ w -b\t value w and returns 1 if and only if C(w) = 1. 

2.1 The NP-completeness of Affine Map Constraints 

In this section we will show that circuit satisfiability can be recast as a set of 
constraints on the image of an affine map a aV + b. 

Groth, Ostrovsky and Sahai GOS12 used that a NAND-gate with input wires 
a, b and output wire c can be “linearized”. Given values a, 5, c G {0, 1}, with 0 
meaning false and 1 meaning true, and writing c for 1 — c, we have 


c = -i (a A b) if and only if a + b — 2c G {0, 1}. 


All logic gates with fan-in 2 can be linearized. We will without loss of gener- 
ality ignore gates corresponding to c = a, c = a, c = 6, c = 5, c = 0 and c = 1 
since they are trivial and can be eliminated from a circuit. This leaves us with 10 
types of logic gates. Table [l] displays their truth tables and their linearizations. 

Let C be a circuit with m wires and n fan-in 2 gates. We can use linearization 
of the logic gates to rewrite the circuit as a set of constraints on the output of 
an affine map. 
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Table 1 . Linearization of logic gates with inputs a, b and output c. We omit the 6 
remaining gates, which depend on at most one input and are not used in circuits. 

AND OR XOR 


a 

b 

c 

0 

0 

0 

0 

1 

1 

1 

0 

1 

1 

1 

1 


a 

b 

c 

0 

0 

0 

0 

1 

0 

1 

0 

0 

1 

1 

1 


a 

b 

c 

0 

0 

0 

0 

1 

1 

1 

0 

1 

1 

1 

0 


a + b — 2c G {0, 1} a + b — 2c G {0, 1} a + b + c £ {0, 2} 
NAND NOR XNOR 


a 

b 

c 

0 

0 

1 

0 

1 

1 

1 

0 

1 

1 

1 

0 


a 

b 

c 

0 

0 

1 

0 

1 

0 

1 

0 

0 

1 

1 

1 


a 

b 

c 

0 

0 

1 

0 

1 

0 

1 

0 

0 

1 

1 

0 


a + b — 2c G {0, 1} a + b — 2c G {0, 1} a + b + c G {0, 2} 


a A 6 a A 5 a A 5 a A 5 


a 

b 

c 

0 

0 

0 

0 

1 

1 

1 

0 

0 

1 

1 

0 


a 

b 

c 

0 

0 

0 

0 

1 

0 

1 

0 

1 

1 

1 

0 


a 

b 

c 

0 

0 

1 

0 

1 

0 

1 

0 

1 

1 

1 

1 


a 

b 

c 

0 

0 

1 

0 

1 

1 

1 

0 

0 

1 

1 

1 


a + b 2c G {0, 1} n + b — 2c G {0, 1} a + b — 2c G {0, 1} a + b — 2c G {0, 1} 


Theorem 1. For any circuit C with m wires and n fan-in 2 gates for a total 
size of d = m + n, there exists a matrix V G Z mxd and a vector b G Z d such that 
C is satisfiable if and only if there is a vector a G Z m satisfying aV -\-b G {0, 2 } d . 

The matrix V and the vector b can be constructed such that aV + b G {0, 2} d 
implies a G {0, l} m and corresponds to the values on the wires in a 

satisfying assignment for C with the first i bits being the input wires. 

Proof. We represent an assignment to the wires as a vector a G Z m . The assign- 
ment is a satisfying witness for the circuit if and only if the entries belong to 
{0, 1}, the entries respect all gates, and the output wire is 1. 

It is easy to impose the condition a G {0, l} m by requiring a{21) G {0,2} m . 
(Alternatively, whenever cq G {0, 1} is clear from the context, for instance for 
the public inputs ai, . . . , this check can be omitted.) 

Since d = l — a, 6 = 1 — b and c = 1 — c and after scaling some of the gate 
equations from Table [T] by a factor 2, we can write all gate equations in the form 
aa + /3b + 'ycF 5 G {0,2}. We want the circuit output wire c ou t to have value 1. 
We do that by adding the condition 3c ou t to the linearization of the output gate, 
since if c ou t = 0 this adds 3 to the linear equation and brings us outside {0,2} 
regardless of the type of logic gate. 
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Define G G Z rnxn and S G Z n such that aG + 5 G {0, 2} n corresponds to the 
linearization of the gates as described above, and let 

V=[2I\G] and b = ( 0 | S ). 

The existence of a such that 


aV + b€ {0,2} d 

is equivalent to a satisfying assignment to the wires in the circuit. □ 

Note that V and b as we constructed them have some additional properties. 
The matrix V is sparse, since it only has m+3n non-zero entries. The row vectors 
of V and b are all linearly independent. Furthermore, all entries in V and b are 
small integers. The small size of the integers gives us the following corollary. 

Corollary 1. For any circuit C with m wires and n fan-in 2 gates and for any 
p > 8 there exist a matrix V G Z™ xd (with d = m + n) and a vector b G Z d 
(giving us m -\- 1 linearly independent row vectors) such that C is satisfiable if and 
only if there exists a vector a G Z™ satisfying aV + 6 G {0, 2} d . Furthermore, if 
aV + b G {0, 2} d then a G {0, l} m and C(a \ , . . . , af) m 1. 

Relation to closest vector problem. There is an interesting connection between 
our construction of affine map constraints and the closest vector problem for in- 
teger lattices using the max- norm i^. Consider a circuit made just from NAND- 
gates; then the affine map aV + b constructed in the proof of Theorem [T| cannot 
take the value 1 for any index i = 1 , . . . , d, which means the circuit is satisfi- 
able if and only if aV + b — 1 G { — 1,0, l} d . This is equivalent to saying that 
the lattice generated by the rows of V has a vector aV with distance at most 
1 from the target vector t = 1 — 6, i.e., || aV — t||oo < 1, if and only if the 
circuit is satisfiable. Our construction therefore gives a very direct reduction of 
the closest vector problem in integer lattices to circuit satisfiability. The NP- 
hardness of the closest (nearest) vector problem was first demonstrated by van 
Emde Boas }vEB811 but using a more complicated reduction that relied on the 
partition problem. 

2.2 The NP-completeness of Square Span Programs 

We will now connect affine maps to square span programs, which gives a reduc- 
tion of square span programs to circuit satisfiability. 

Corollary [T| can be reformulated to say that, for any circuit C and p > 8, 
there exist V and b such that C is satisfiable if and only there exists a ez™ 
satisfying 

(aV + b) o (aV + b - 2) = 0, 

where o denotes the Hadamard product (entry- wise multiplication). We can 
rewrite this condition as 


(aV + b — 1) o (aV + b — 1) = 1. 
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Let ri, , Vd be d distinct elements of 7L V for a prime p > ma x(d, 8). Define 

vq(x),v\(x), . . . , Vm^x) as the degree d — 1 polynomials satisfying 

v o(rj) = bj - 1 and Vi(rj) = Vij. 

We can now reformulate Corollary [l] again. The circuit C is satisfiable if and 
only if there exists a G Z™ such that for all rj 


v o( r j) + '52 a i v i( r j)) =1 - 

i= 1 / 

Since the evaluations in uniquely determine the polynomial f(x) = 

^o(^) + Y^,iLi a i v i ( x ) we can rewrite the condition as 

( m \ ^ d 

v 0 (x) + ^2a,iVi(x) 1 =1 mod JJ(x-rj). 

i=l / J=1 

Theorem 2. A circuit C with m wires and n fan-in 2 gates has for any prime 
p > max(n, 8) a square span program of size m and degree d = m+n that verifies 
it over 7L V . 

Proof From the discussion above, we see that for any circuit C with m wires 
and n gates there exists polynomials vo(x),vi(x), ...,v m (x) and distinct roots 
ri , . . . , rd such that C is satisfiable if and only if 

d 

H (x — rj ) divides 

3 = 1 

Define t(x) = Yl^ =1 {x — rj) to get 
that verifies C over 7L V . 


^t>o(a;) + Y 2 a i y i( x ) ) - 1- 
an SSP Q = (v 0 (x),vi(x), . . . ,v m (x),t(x)) 


2.3 Example 

As a small example of the process of generating a square span program, consider 
a circuit consisting of a single XOR-gate as = a\ ® 02 (here t = £ u + £ w = 2 
with i u = 0 and £ w = 2). To guarantee ai,a 2 ,a 3 G {0, 1} and the XOR-gate is 
respected we use the constraints 2 oq G {0, 2} and ai+a 2 +a 3 G {0, 2}. The output 
should be a 3 = 1, which we represent with the constraint 3 d 3 = 3(1 — as) = 0. 
We add the latter constraint to the output wire’s equation to get the combined 
a\ + <22 — 2 a 3 + 3 G {0, 2}, which at the same time guarantees (23 = a\ ® 02 and 
a 3 = 1. We can represent the constraints as 

/ 2 0 0 1 \ 

aV + b= ( ai , 02 ,o 3 ) 0 2 0 1 + (0, 0, 0, 3) e {0, 2} 4 . 

\ 0 0 2 -2 J 
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The satisfiability of the circuit can therefore be represented by 4 quadratic equa- 
tions 

(2oi - l) 2 = 1 (2 a 2 - l) 2 = 1 (2 a 3 - l) 2 = 1 (oi + a 2 - 2 a 3 + 2) 2 = 1 

corresponding to (aV + b — 1 ) o (aV + b — 1 ) = 1 . 

To get a square span program, let p > 8 be a prime and r*i, 7*2, 7*3, 7*4 be four 
distinct elements in Z p . Pick degree 3 polynomials no(x), ni(x), ^(x), such 
that 

(i’o(n), u 0 (t- 2 ), v 0 (r 3 ),v 0 (r 4 )) = 6 - 1 = (- 1 ,- 1 , - 1 , 2 ) 

and 

/ Vi (n) Ui(r 2 ) Vi(r 3 ) Vi(r 4 ) \ / 2 0 0 1 \ 

v 2 (ri) v 2 (r 2 ) v 2 (r 3 ) v 2 (r 4 ) = ]/ = 0 2 0 1 . 

\ v 3 (n) v 3 {r 2 ) v 3 (r 3 ) v 3 (r 4 ) ) V 0 0 2 -2 ) 

Let t(x) = (x — ri)(x — r 2 )(x — r 3 )(x — r 4 ) to get a square span program 

( Vo (x), Ui (2:), v 2 (2:), v 3 (#),£(#)) for the circuit such that 

t( x) divides ^ 0 ( 2 ;) + aiVi(x) + a 2 v 2 ( x) + a 3 v 3 (x)j - 1 

if and only if ai,a2,a3 satisfy the circuit, i.e., ai,a2 G {0, 1}, as = 1 and <23 = 

a\ 0 a 2 . 


2.4 Comparison to Quadratic Span Programs 

Square span programs can be seen as a simplification of quadratic span programs. 
Below we recall the definition of quadratic span programs given by Gennaro, 
Gentry, Parno and Raykova |GGPR 13 j . 

Definition 2. A quadratic span program over a field ¥ contains two sets of 
polynomials V = {^(x), . . . , v m (x)} and W = {w' 0 (x ), . . . , w m (x)} and a target 
polynomial t(x). It also contains a partition of the indices X = {1, . . . , ra} into 
X — - 27 iar>eied G Xf ree and a further partition Tfabeied — U^_-^ q • 

For inpu H y G {0, 1 Y, let X y = 2f ree uf =1 X^ Vi be the set of indices that 
“ belong ” to y. The quadratic span program accepts an input y G {0, 1 }^ if and 
only if there exist ai^bi G ¥ such that 

t{x) divides ^o( x ) + E CHVi( x) ■ ^0(2;)+^ biWi(x) 

\ ieXy J y iei y 

We say the quadratic span program verifies a boolean function f : {0, l } 1 
{0, 1} if it accepts exactly those inputs y where f(y) = 1. We say the size of the 
quadratic span program is m and the degree is deg(£(x)). 



2 In the rest of the paper, we will be using inputs of the form y = (u, w) where u of 
size i u is considered public and w is considered private. 
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Table 2. Costs compared with prior work [£ input wires, out of which £ u are public, 
m wires in total and n gates). In a circuit with fan-in 2 gates m < 2n + 1, so we get 
rough bounds of size 2 n and degree 3n when computed as a function of the number of 
gates n only (ignoring £ u ). 


Size and degree of Span Programs 



Size 

Degree 

Quadratic span programs GGPR13 

3 6n 

130n 

Quadratic span programs (Lipmaa) Lipl3 

14n - 14£ - 2 

lln — 12£ — 2 

Square span programs 

m 

m + n — £ u 


A square span program uses the simpler condition 


t(pc) divides ^uo(x) + Y, a M x )J 
which is equivalent to 

/ m \ / m \ 

t(x) divides I vo(x) + 1 + E*»(') • Uw-i+E«>(>) 


i= 1 


i= 1 


A square span program can therefore be seen as a particularly simple type 
of quadratic span program where w' 0 (x) = v' 0 (x) — 2 and Wi{x) = Vi(x) and 
ai = bi. Furthermore, 2i abe i ed = {1, . . . ,£} with l i , yi = {i} and l i ^ i = 0, and 

-^free — H - I5 • • • 5 

The compilation of a circuit into a quadratic span programs in GGPR13] has 
a significant overhead. For a circuit with i input wires and m wires in total and 
n gates, the size of the resulting quadratic span program is 36n and the degree 
is 130n. Lipmaa |Lipl3| gave a class of more efficient quadratic span programs. 
Included in this class is a quadratic span program of size 14n — 14^ — 2 and 
degree 1 In — 12f? — 2. In comparison with these works our (square) quadratic 
span programs are much more compact with size m — i u and degree m + n — 
£ u (assuming the verifier checks its inputs are all in {0,1}.) These costs are 
summarised in Table [2j 

A further advantage compared to the previous works is that we consider all 
types of logic gates, whereas they only consider NAND, AND and OR gates. We 
would expect that their constructions can be generalized to handle other logic 
gates but do not know whether this would increase the cost. 


Remark. All three results prove that a circuit — fixed when the quadratic span 
program is generated — is satisfied for public input u and private input w. Uni- 
versal circuits allow using a single program for all n' gate circuits at the cost of 
n = n! • 19 log n' jVa!76| . 
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3 Succinct Non- interactive Arguments of Knowledge 

We will now use square span programs to construct succinct non-interactive 
zero-knowledge arguments of knowledge using bilinear groups. 

Notation. Given two functions /, g : N [0,1] we write /(A) ~ g( A) when 
| /(A) — g( A) | = A - ^ 1 ). We say that / is negligible when /(A) ~ 0 and that / is 
overwhelming when /(A) « 1. 

We write y = A(x;r ) when the algorithm A on input x and randomness r, 
outputs y. We write y i— A(x) for the process of picking randomness r at random 
and setting y = A(x ; r). We also write y ^ S for sampling y uniformly at random 
from the set S. We will assume it is possible to sample uniformly at random from 
sets such as Z p . 

Following Abe and Fehr |AF07| we write (y; z) <— (A || Xa)( x ) when A on 
input x outputs y and A a on the same input (including random coins) outputs z. 


3.1 Non-interactive Zero-Knowledge Arguments of Knowledge 

Let {7 ^a}agn be a sequence of families of efficiently decidable binary relations R. 
For pairs (u,w) G R we call u the statement and w the witness. A non-interactive 
argument for {7 ^a}agn is a quadruple of efficient algorithms (Setup, Prove, Vfy, 
Sim) working as follows: 

(cr, r) <— Setup(l A , R): the setup algorithm takes as input a security parameter A 
and a relation R G 7Z\ and returns a common reference string cr and a 
simulation trapdoor r for the relation R. 

7r i — Prove(cr, u, re): the prover algorithm takes as input a common reference 
string a for a relation R and (u,w) G R and returns an argument 7r. 

0/1 <— Vfy (a, u, 7r): the verification algorithm takes as input a common reference 
string, a statement u and an argument i r and returns 0 (reject) or 1 (accept). 
7 r Sim(r, u): the simulator takes as input a simulation trapdoor and a state- 

ment u and returns an argument i r. 

Definition 3. We say (Setup, Prove, Vfy, Sim) is a perfect non-interactive zero- 
knowledge argument of knowledge for {7^a}agn if it has perfect completeness, 
perfect zero -knowledge and computational knowledge soundness as defined below. 


Perfect completeness. Completeness says that, given any true statement, 
an honest prover should be able to convince an honest verifier. For all A G N, 
R G (' u,w ) G R 


Pr 


(cr, t) Setup(l A , R); 7r Prove(cr, u, w) : Vfy (a, u,tt) = 1 


= 1 . 


Perfect zero-knowledge. An argument is zero-knowledge if it does not leak 
any information besides the truth of the statement. We say (Setup, Prove, Vfy, 
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Sim) is perfect zero-knowledge if for all A G N, R G 7Z\,(u,w) G R and all 
adversaries A, we have 


Pr 

Pr 


(cr, r) <— Setup(l A , R); tt <— Prove(cr, u, w) : Al(cr, t, 7r) = 1 


(cr, r) <— Setup(l A , R); tt <— Sim(r, u) : Al(cr, r, 7r) 


Computational knowledge soundness. We call (Setup, Prove, Vfy, Sim) an 
argument of knowledge if there is an extractor that can compute a witness when- 
ever the adversary produces a valid argument. The extractor gets full access to 
the adversary’s state, including any random coins. 

Formally, we require that, for all sequences (-Ra)aen of polynomially bounded 
relations in {7 ^a}agn and non-uniform polynomial time adversaries A, there 
exists a non-uniform polynomial time extractor Xj± such that 


Pr 


(cr, t) <— Setup(l A , R\) 
{(u, 7r); w) (A II X A )(a) 


( u,w ) ^ R\ 
Vfy(cr,M,7r) = 1 


0 . 


Remark. Our notion of knowledge soundness guarantees security against an 
adaptive adversary, cf. |AF07j , that chooses the instance u depending on the 
CRS cr. However, to get adaptive security for circuit satisfiability, 1Z\ has to 
be universal, i.e., it has to check that a circuit u is satisfiable. For performance 
reasons, this is usually not what one wants, and adaptive soundness for a more 
restrictive 1Z\ is preferable. See Lipmaa |Lipl4| for how to achieve adaptive 
soundness for some NP-complete languages, not including circuit satisfiability, 
while avoiding universal circuits. 


3.2 Bilinear Groups 

Let Q be a bilinear group generator that, on security parameter A, returns 
(p, G, G, G t, e) Q( 1 A ) with the following properties: 

— G, G, G t are groups of prime order p ; 

— e:GxGG G t is a bilinear map, that is, for all U G G, V G G, a, b G Z, 
we have e{U a , V b ) = e(C7, V ) ah ; 

— if G is a generator for G and G is a generator for G then e(G, G ) is a generator 
for G t; and 

— there are efficient algorithms for computing group operations, evaluating the 
bilinear map, deciding membership of the groups, deciding equality of group 
elements and sampling generators of the groups. 

There are many ways to set up bilinear groups both as symmetric bilinear 
groups where G = G and as asymmetric bilinear groups where G^G. Our con- 
struction works for both symmetric and asymmetric bilinear groups. Currently, 
asymmetric bilinear groups are more efficient and therefore the most appropriate 
choice in practice fGPS08j . 
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The g-POWER knowledge of exponent assumption. The knowledge of ex- 
ponent assumption (KEA) introduced by Damgard [Dam m says that given 
G, G' = G a it is infeasible to create V, V' such that V' = V a without knowing a 
such that V = G a and V' = G' a . Bellare and Palacio |BPQ4] extended this to the 
KEA3 assumption, which says that given G, G s , G', G' s it is infeasible to create 
V, V' = V a without knowing ao,ai such that V = G a °(G s ) ai . This assumption 
has been used also in symmetric bilinear groups by Abe and Fehr AF07 who 
called it the extended knowledge of exponent assumption. 

The g-power knowledge of exponent assumption is a generalization of these as- 
sumptions in bilinear groups. It says that given (G, G, G s , G s , . . . , G s<? , G s<? ) it is 
infeasible to create V, V such that e(V,G) = e(G, V) without knowing ao, . . . , a q 
such that V = rii=o(^ S ) ai - The #"P ower knowledge of exponent assumption 
was introduced in [GrolOj for symmetric bilinear groups using G = G a with 
a chosen at random. Here we adapt it with minor modifications to the general 
setting where it may be the case that G and G, G belong to different groups. 

Definition 4 (g-PKE). The g(A) -power knowledge of exponent assumption 
holds relative to Q for the class Z of auxiliary input generators if, for every non- 
uniform polynomial time auxiliary input generator Z G Z and non-uniform poly- 
nomial time adversary A, there exists a non-uniform polynomial time extrac- 
tor T 4 such that 


gk i — (p, (Gr, (Gt, Gx 1 , e) i — Q(^ 1^)5 G i — G* 

Z(gk, G, . . . , G s<? ); G <- G* 

(V,V]a 0 ,...,a q ) <r- (A || X A )(gk, G, G, . . . , G s? , G s? , z) : 
e(V, G) = e(G, V) A V ^ G^= 


An adaptation of the proof in Groth [GrolOj shows that the g-PKE assumption 
holds in the generic bilinear group model. 

As demonstrated by Bitansk y, Cane tti, Panet h and Rosen [BCPR13] . if in- 
distinguishability obfuscators |BGI + 12 l lGGH+13 exist, then there are auxiliary 
input generators for which the g-PKE assumption does not hold. However, their 
counterexample is specifically tailored to make extraction difficult and, as they 
explain, the g-PKE assumption may hold for “benign” auxiliary input genera- 
tors. We will later use auxiliary input generators that generate group elements 
in G and G in a specific manner according to the relations R\ and we will 
conjecture that such auxiliary input generators are benign and that the g-PKE 
assumption holds with respect to them. 

The g-POWER Diffie-Hellman assumption. The g-power DifHe- Heilman as- 
sumption says given (G, G, . . . , G s<? , G sQ , G s<?+2 , G sQ+2 , . . . , G s2q , G s2q ) it is hard 
to compute the missing element G s<?+1 . 
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Definition 5 (g-PDH). The q(\) -power Diffie- Heilman assumption holds rel- 
ative to Q if for all non-uniform probabilistic polynomial time adversaries A 


Pr 


gk := (p, G, G, G r , e) <- G(l x ); G <- G*; G <- G*; s <- Z* p 

Y <- A(gk, G,G,...,G S \ G ° q , G s9+ * , & q+ * , . . . , G ^ , G ° 2q ) : 

Y = G sq+1 


0. 


An adaptation of the proof in Groth |OrolO| shows that the g-PDH assumption 
holds in the generic bilinear group model. 

The target group strong Diffie-Hellman assumption. We adapt the 
strong Diffie-Hellman assumption }BBQ8j in the target group |PHGR 13 | to the 
asymmetric setting. It says that given (G, G, . . . , G s? , G s? ) it is hard to find an 
r G Z p and compute e(G, G) 7 ^ . 

Definition 6 (g-TSDH). The q(X) -target group strong Diffie-Hellman assump- 
tion holds relative to Q if for all non-uniform probabilistic polynomial time ad- 
versaries A 


Pr 


(p, G, G, G T, e) <r- S(1 A ); GuG*;Gf-G*;suZ; 
(r,Y) ^^(p,G,G,G T ,e,G,G,...,G s9 ,G s<? ) : 
r eZ p \{s} A Y = e(G,G)^ 


0 . 


An adaptation of the proof in Boneh and Boyen }BB 08 j shows that the g-TSDH 
assumption holds in the generic bilinear group model. 


3.3 Succinct Perfect NIZK Arguments 

We will now construct succinct and perfect NIZK arguments of knowledge for any 
functions £ u Aw and families {7Z}\ of relations R of pairs (u,w) G { 0 , x 
{0 , that can be computed by polynomial size circuits with m( A) wires 
and n( A) gates for a total size of d( A) = m( A) + n( A). 

(cr, r) <— Setup(l A , R ): Run gk := (p, G, G, G t, e) <— t/(l A ). Parse R as a boolean 
circuit Cr : { 0 , x { 0 , 1 }^ -A { 0 , 1 }. Generate a square span program 
Q = ( v o( x ), • • • 5 v m (x),t(x)) that verifies Cr over Z p . Pick G <— G* and 
G, G G* and /?, s G- Z* such that t(s) 7^ 0 . Return 


<7 = (gfc, G,G,.. .,G sd ,G sd , {G^ Vi( ' s ' > }i > e u ,G^ s ' > pp 13 , Q) 

T = (<J,/3,s). 

7r G- Prove(cr, ia, re): Parse ^ as (ai, . . . , a^) G {0, and use re to compute 

az u + 1 , . . . , a m such that t(x) divides ^o(^) + — 1. 

Pick S G- Z p and let 


_ ( vq ( x ) + Fill + <^0*0) 2 - 1 
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Use linear combinations of the elements in a to compute 

H _ Qh(s) Y w _ qYZ>l u aiVi(s)+6t(s ) 

_ QP(Y,7>i u a.iVi(s)+6t(s )) y _ QVo{s)+Y,7=i a.iVi(s)+6t(s) 

and return 7 r = (if, W, V). 

0/1 i — Vfy(cr, u , 7r): Parse u as (ax, . . . , a^ n ) G {0, and 7 r as (if, W? f? w , U) G 
G 3 x G. Compute V = G v °( s ) + ^= i aiV d s )y w and return 1 if and only if 

e(V,G) = e(G,V) e{H, G*W) = e(V,V)e(G, Gy 1 e(V w , G 0 ) = e(B w , G). 

tv <— Sim(r, u): Parse u as (ai, , . . , ae u ) € {0, 1 } Cu and pick 5 W <— Z p at random. 
Let 



and return 7 r = (G h , G 6w , G@ 5w , G v °( s ) + ^=i 

Let Z be a family of non-uniform polynomial time auxiliary input generators Z 
such that each of them corresponds to sequences of relations (R\) agn in a family 
of relations {7 ^a}agn- They work such that Z corresponding to (-Ra)agn on in- 
put (p, G, G, Gy? e, G, . . . , G sQ ) generates the final part of the common reference 
string, i.e., returns 2 = G^ s \ G, G^, Q). 

Theorem 3. 77ie construction above is a perfect NIZK argument for the family 
of relations {7^a}agn bounded by d( A) with computational knowledge soundness 
if the d(X)-PKE, d(X)-PDH and d(X)-SDH assumptions hold relative to Q and 
the family of auxiliary input generator Z defined above. 

Proof Perfect completeness follows by direct verification. 

Perfect zero-knowledge follows from observing that both a real argument and 
a simulated argument have a uniformly random V w because t(s) 7^ 0 and 6, 5 W are 
chosen uniformly at random. Once V w has been fixed, the verification equations 
uniquely determine B w , V and if. This means that for any (u,w) G R both the 
real arguments and the simulated arguments are chosen uniformly at random 
such that the verification equations will be satisfied. 

We now describe the witness-extractor for computational knowledge sound- 
ness. The setup algorithm first generates a bilinear group (p, G, G, Gr,e) <— 
(5(1 A ) and picks G G* and s <— Z*, which are used to compute G, . . . , G s . 
This is exactly like the input given to the auxiliary input generator in a d- 
PKE challenge. The setup algorithm now generates a square span program Q 
over for the relation R\ and elements { G f3vi ^}i>i u and G, & . We can con- 
sider this as part of the auxiliary input z that Z outputs in the d-PKE defini- 
tion. More precisely, let A! be the d-PKE adversary that, on (p, G, G, Gr,e, 
G,G,...,G s “,G sd ) and auxiliary input z = ({G^) }*>£„, G^ s \ G, Q) 
runs (iq if, V w , B w , V) <— A(cr) with a = (p, . . . , G s ) and returns (V, V) where 
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V = G v °( s ) + ^i = i aiVi ( s )V w when u = (ai, . . . , az u ) G {0, lY u . Let X A > be the cor- 
responding extractor according to the d-PKE assumption that returns Co, . . . , c d 
such that V = when e(V, G) = e(G,V). Our witness-extractor X A 

given a runs (V, t>; c 0 , . . . , c d ) <- {A' || X A ,)(p, G, G, G T , e, G, G, , G* d , G* d ,z), 

which defines a polynomial Yli=o c i x% • Define S = c d to get a degree d — 1 poly- 
nomial v(x) = J2i = o — St(x). If it is possible to write the polynomial on the 
form v(x) = vo(x) + YliLi a i v i ( x ) such that (ai, . . . , a m ) G {0, l} m is a satisfying 
assignment for the circuit Cr with u = (ai, . . . , a^J then the extractor returns 

We will now show that with all but negligible probability the extracted polyno- 
mial v(x) does indeed provide a valid witness w G {0, 1 } £w such that (u, w) G 7Z\. 
Let Q be the square span program (fo(^), . . . , v m (x), t{x)) specified in a that 
verifies R\ over Z p . We know by Theorem [2] that if t(x) divides v(x) 2 — 1 and 

^mid(^) = J2i = 0 c i x% - V Y X ) - Ynh aiVi( x ) belongs to the span of {v i {x)} i> z u 
then indeed w G {0, lY w and (u,w) G 7Z\. So we will in the following show that 
the two cases, t(x) does not divide v{pc) 2 — 1 or v m id(x) is not in the appropriate 
span both happen with negligible probability breaking the d-TSDH assumption 
or the d-PDH assumption respectively. 

Given a d-TSDH challenge (p, G, G, Gt, e, G, G, . . . , G fid , G sd ), we pick /3 <— 
Z* and roots rq , . . . , r d in the same way the setup algorithm does and simulate 
a common reference string a. Suppose the adversary and extractor return u = 
(ai, . . . , d£ u ) G {0, lY u , a valid proof 7 r = (id, V W ,B W , V) and Co, . . . , c d such 
that V = G^^D+SiS i aiVi ( s )v w = G^ d =o CiS \ Let v{x) = c i x% ~ St(x) with 
S = c d as before and define p{x) = (v(x) + 5t{x)) 2 — 1 and suppose p{x) is not 
divisible by t(x). Let be a root of t(x) such that x — ri does not divide p(x). 
We can write p{x) = a(x){x — r^) + 6, where a(x) is a degree 2d — 1 polynomial 
in Z p [x\ and b G Z*. The verification equation e(H, G^D) = e(V, V)e(G, G) _1 

gives us e(iL, G s_r i ) = e(G, G) a ^ s; . The adversary can use generic group 
operations on the d-TSDH challenge to compute G s_r * and e(G,G) a ( 3 \ which 
allows it to deduce e(G, G) . Rasing this to the power 6 _1 gives a solution 
(ri, e(G, G) s ~ r i ) to the d-TSDH challenge. 

Given a d-PDH challenge (p, G, G, Gt, e,G,G,..., G s \ G s \ G sd+2 , G sd+2 , . . . , 
G s , G s ) we pick a random degree d polynomial a(x) such that a(x)vi(x) has 
coefficient 0 for x d for all i u < i < m and a{pc)t{x) also has coefficient 0 for 
x d . There are d + £ u — m — 1 > 0 degrees of freedom in choosing a{x) so for 
a polynomial v m [d{ x ) outside the span of and t(x) the polynomial 

a(x)v m id( x ) has a random coefficient for x d . 

Now pick at random b Z p and define /3(x) = a(x)x + b and let /3 = /3(s). 
Observe that G^ Vi ^ = G( a ( s ) s + b )^( s ) can be constructed from our challenge 
without knowing G sd+1 ; and the same goes for G^ s \ Pick p <(— Z* at random 
and compute G = G pt ^ and & = G p ^ s \ Give to the adversary a simulated 
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common reference string 


(p, G, G, Gt, e, G, G, . . . , G s , G s , {G^ Vx ^}i > i u , G^ s \ G, G^, Q). 


Suppose the adversary and extractor return u = (ai,...,a^ u ) G {0, 1}^ U , a 
valid proof 7 r = (i7, V w , B w , E) and Co, . . . , q such that V = G^*=o CiS \ Define 
^mid(^) = J2i = o c ^ z — Vo (a) — Z^i=i UiVi(x). Due to the random choice of b the 
value f3(s) = a(s)s+b does not reveal anything about a(x), so if v m id(x) is outside 
the span of {viix)}^^ and t(x) then a(x)u m id(x) has a random coefficient for 
x d+1 . With probability 1 — - this means the adversary returns B w = G /3 ^ Vrnid ^ 

where /3(x)v m id(x) = J2i=o a known polynomial with a non-trivial coeffi- 

cient bd + 1 7 ^ 0 for x d+1 . We can now take an appropriate linear combination of 
B w and the elements G, . . . , G sd , G sd+2 , . . . , G s2d to compute G sd+1 , which solves 
the d-PDH challenge. □ 

The proof of Theorem [3] suffers a computational overhead in the reduction 
by using an extractor Xjs, for A. Except for this computational overhead, the 
security reduction for knowledge soundness is tight. It is possible to eliminate 
the g-TSDH assumption and rely solely on the g-PKE and g-PDH assumptions, 
but then the security reduction loses a factor q and is therefore not tight. 


3.4 Efficiency 

In this section, we will assume our NIZK argument is instantiated with the 
square span program that we constructed in Section 12.21 This choice of square 
span program enables a number of optimizations that makes the argument highly 
efficient. 

qJ27>i u aiVi(s)+8t(s) 

QP(j27>e u aiVi(s)+8t(s )) 

QV 0 (s)+J2iLi aiVi(s)+<St(s) 

It is possible to compute the polynomials J2i>£ u a i v i ( x ) + St(x) and vo(x) + 
UiVi(x) + St(x) and then compute the appropriate exponentiations of the 
polynomials evaluated in 8 using the elements G, G, . . . , G fid , G sd , {G f3vi ^}i > ^ u ^ 
and G^ s ^ from the common reference string. However, this requires O(d) expo- 
nent at ions to the coefficients of the polynomials. Following }GGPR13] a signifi- 
cant saving can be made by precomputing {G Vi ^}i > £ u: G and {G Vi ^}i^o, 
G t( ^ s \ Since each cq £ {0, 1} this makes it possible to compute V Wl B w and V us- 
ing at most 3 m + 1 — 2£ u multiplications and 3 exponentiations. (Pragmatically, 
taking advantage of our uniform support for all gates, we can profile the SSP 
and ‘flip’ internal values from ai to di to ensure that cq is more often equal to 0 
than to 1, thereby on average performing less than half of those multiplications.) 

The prover also has to compute H = G h ^ s \ where h(x) = (^(^)+^G)) -1 with 
v(x) = vq(x) + YliLi UiVi(x) and t(x) = Yli=i( x ~ g)- We can evaluate h(x) in 


The prover has to compute 

V w = 
B w = 
V = 
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d points r[,...,r d using two discrete Fourier transforms as follows. The degree 
d — 1 polynomial v{x) is uniquely determined by its evaluation in the d points 
7 * 1 , . . . , rd- In our square span program the evaluations in the points rq , . . . , rd can 
be computed easily given the values of the wires in the circuit. Using an inverse 
discrete Fourier transform, we compute the coefficients of v(x) = Yli=o c i xl - 
Let 7 G Z* be given such that ... ,r d defined as r\ = gives us 2d dis- 
tinct values ry, . . . , r' u , . . , r d . Compute c[ = to get the coefficients of the 
polynomial v'(x) = Yli=o c i xl an d use a discrete Fourier transform to evaluate 
v'(x) in This gives us evaluations of v(x) in the points r[^. ..,r' d 

since v(r f j) = v'(rj). We have h(x) = — h 2 5v(x) + S 2 t(x). Assuming 

^( r i) _1 : • • • ^( r d) _1 have been precomputed, it only costs 3d multiplications in 
Z p to evaluate + 2 Sv(x) in the d points r [, . . . , r' d . Using Lagrange inter- 

polation in the exponent, this allows us to compute 


G 


v(s) 2 -l 

t(s) 


+2<5v(s) 





+25v(r'j) 


3 = 1 


where (x) is the Lagrange basis polynomial for r' . By multiplying with (G^ 5 ) ) &2 
we then get G h ( s \ 

To speed up the computation, we can set up a modified common reference 
string for the prover 


^Prove — 


p, G, G, G r , e, G, G, {<>*«}*>,„, {G^} i>u 

G™‘\G , 7 , {/(K) 'My,, {G^W}f =1 , Q 


The computational cost for the prover is dominated by d exponentations in G 
and 2 discrete Fourier transforms in Z v . The two discrete Fourier transforms 
cost 0(dlog 2 d) multiplications in general but the computation can be reduced 
to O(dlogd) multiplications when Z p is of a form amenable to using the fast 
Fourier transform. 

The verifier needs to compute V = (7^°( S )+Z)i=i a^O)-^ anc [ eva i ua t e three 
pairing product equations e(V, G) = e(G, V), e(H, G 1 ^) = e(V, V)e(G, G) _1 , 
and e(y w: &) = e(B w ,G). The verifier does not need the full common reference 
string but can use a more compact common reference string 

avfy = (p, G, G, G r , e, G, }^ 0 , G, G , G, , 

which only has i u + 6 group elements H Verification is also computationally effi- 
cient, in the worst case it requires £ u + 1 multiplications in G, one multiplication 
in G t and 6 pairings if we precompute e(G, G) _1 . 

For a large circuit, the cost of verification can be much smaller than the cost of 
evaluating the circuit itself, even if the witness w is known to the verifier. This 


3 Using the binary representation of the public input u from PHGR 13 this can be 
further reduced to + 0(1) group elements. 
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Table 3. Size in number of group elements (either G or G), performance in terms of 
pairings (P) or multiplications in G or G t respectively 


Proof size and verification cost comparison with Pinocchio 



Proof Size (elements) 

Verification cost 

Pinocchio [PHGR13 

This work 

8 

4 

14P + (£ u + 4)G + IGt 
6P + ( £ u + 1)G + IGt 


makes the NIZK argument a succinct non-interactive argument of knowledge 
that is suitable for verifiable computation protocols 0 

Partly due to the lack of benchmarks, it is hard to compare the performance of 
SNARK protocols quantitatively without carefully reimplementing them. Table[3] 
compares the proof sizes and operations performed by the verifier between our 
protocol and Pinocchio, arguably the state of the art in terms of proof size 
and verification speed for QAPs. On this basis and the numbers reported in 
[PHGR13] . we conservatively estimate that an SSP implementation based on 
the Pinocchio library would offer 160-byte proofs verified in less than 6 ms. 

4 Conclusion 

We introduce a representation of logic circuits, or predicates on propositional 
formulae, using quadratic constraints on an affine map. The map is built using a 
linearization of each gate, and a set of constraints to ensure all values of wires are 
binary. This leads to a simple and elegant formulation of square span programs, 
and in turn to efficient, minimalistic constructions for NIZKs and SNARKs. 

The simplifications are twofold: (i) our representation of boolean functions 
no longer requires wire checkers and (ii) square span programs consist of only 
a single set of polynomials that are summed and squared. The former improves 
prover efficiency, while the key advantage of the latter are SNARKs with an 
extremely compact proof, consisting of only four group elements, and an efficient 
verification procedure compared to more generic QSP characterisations of the 
same program. 

As can be expected, binary programs such as SSPs remain less efficient than 
arithmetic programs for verifying computations on integers, involving e.g. 32-bit 
additions and multiplications. Those operations have to be encoded as binary 
adders and multipliers, leading to a significant blow-up in circuit size and com- 
putation costs for the prover. It remains an open problem how to extend the SSP 
approach with ideas from QAPs to verify such computations without sacrificing 
its conceptual simplicity and short proofs. 

4 In some cases, for instance when outsourcing computation, the verifier may be the 
one that sets up the common reference string. In that case the verifier may know /3 
and s, which can further decrease the cost of verification. 
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Abstract. Lattice problems are an attractive basis for cryptographic systems be- 
cause they seem to offer better security than discrete logarithm and factoring 
based problems. Efficient lattice-based constructions are known for signature and 
encryption schemes. However, the constructions known for more sophisticated 
schemes such as group signatures are still far from being practical. In this paper 
we make a number of steps towards efficient lattice-based constructions of more 
complex cryptographic protocols. First, we provide a more efficient way to prove 
knowledge of plaintexts for lattice-based encryption schemes. We then show how 
our new protocol can be combined with a proof of knowledge for Pedersen com- 
mitments in order to prove that the committed value is the same as the encrypted 
one. Finally, we make use of this to construct a new group signature scheme that 
is a “hybrid” in the sense that privacy holds under a lattice-based assumption 
while security is discrete-logarithm-based. 

Keywords: Verifiable Encryption, Group Signatures, Zero- Knowledge Proofs 
for Lattices. 


1 Introduction 


There has been a remarkable increase of research in the field of lattice-based cryp- 
tography over the past few years. This renewed attention is largely due to a number 
of exciting results showing how cryptographic primitives such as fully homomorphic 
encryption [21] and multi-linear maps [2CJ] can be built from lattices, while no such 
instantiations are known based on more traditional problems such as factoring or dis- 
crete logarithms. Lattice problems are also attractive to build standard primitives such 
as encryption and signature schemes, however, because of their strong security proper- 
ties. In particular, their worst-case to average-case reductions as well as their apparent 
resistance against quantum computers set them apart from traditional cryptographic as- 
sumptions such as factoring or computing discrete logarithms, in particular in situations 
that require security many years or even decades into the future. 


* The research leading to these results has received partial funding from the European Com- 
mission under the Seventh Framework Programme (CryptoCloud #339563, PERCY #321310, 
FuturelD #318424) and from the French ANR-13-JS02-0003 CLE Project. 
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Long-term integrity requirements, e.g., for digital signatures, can usually be fulfilled 
by re-signing documents when new, more secure signature schemes are proposed. The 
same approach does not work, however, for privacy requirements, e.g., for encryption or 
commitment schemes, because the adversary may capture ciphertexts or commitments 
now and store them until efficient attacks on the schemes are found. 

Several lattice-based encryption schemes have been proposed in the literature, e.g., 
1 22 , 24, H 35[], but many of their applications in more complex primitives require effi- 
cient zero-knowledge proofs of the encrypted plaintext. Examples include optimistic 
fair exchange (2], non-interactive zero-knowledge proofs [34], multiparty computa- 
tion El, and group signatures El. In this paper, we present a more efficient zero- 
knowledge proof for lattice-based encryption schemes. We then combine it with a non- 
lattice-based signature scheme to build a group signature scheme with privacy under 
lattice assumptions in the random-oracle model. 


1.1 Improved Proofs of Plaintext Knowledge for Lattice Schemes 

In a zero-knowledge proof of plaintext knowledge, the encryptor wants to prove in 
zero-knowledge that the ciphertext is of the correct form and that he knows the mes- 
sage. Efficient constructions of these primitives are known based on number- theoretic 
hardness assumptions such as discrete log, strong RSA, etc. 

Encryptions in lattice-based schemes generally have the form t = Ae mod q , where 
A is some public matrix and e is the unique vector with small coefficients that satisfies 
the equation (in this general example, we are lumping the message with e). A proof 
that t is a valid ciphertext (and also a proof of plaintext knowledge), therefore, involves 
proving that one knows a short e such that Ae = t . It is currently known how to accom- 
plish this task in two ways. The first uses a “Stern-type” protocol 0 in which every 
run has soundness error 2/3 0. It does not seem possible to improve this protocol 
since some steps in it are inherently combinatorial and non- algebraic. 

A second possible approach is to use the Fiat-Shamir approach for lattices using 
rejection sampling introduced in |27l 2^. But while the latter leads to fairly efficient 
Fiat-Shamir signatures, there are some barriers to obtaining a proof of knowledge. What 
one is able to extract from a prover are short vectors r ' , z' such that Ae' = tc for some 
integer c, which implies that Ae' c _1 = t. Unfortunately, this does not imply that e'c -1 
is short unless c = ± 1. This is the main way in which lattice-based Fiat-Shamir proofs 
differ from traditional schemes like the discrete-log based Schnorr protocol. In the latter, 
it is enough to extract any discrete log, whereas in lattice protocols, one must extract a 
short vector. Thus, the obvious approach at Fiat-Shamir proofs of knowledge for lattices 
(i.e., using binary challenge vectors) also leads to protocols with soundness error 1/2. 

Things do not improve for lattice-based proofs of knowledge even if one considers 
ideal lattices. Even if A and e are a matrix and a vector of polynomials in the ring 
7L q [X\/ [X n + 1), and c is now a polynomial (as in the Ring-LWE based Fiat-Shamir 
schemes in [2811221]), then one can again extract only a short e' such that Ae' c _1 = t. 
In this case, not only is c -1 not necessarily short, but it does not even necessarily exist 
(since the polynomial X n + 1 can factor into up to n terms). At this point, we are not 
aware of any techniques that reduce the soundness error of protocols that prove plaintext 
knowledge of lattice encryptions, except by (parallel) repetition. 
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In a recent work, Damgard et al. El gave an improved amortized proof of plain- 
text knowledge for LWE encryptions. Their protocol allows one to prove knowledge of 
0(k) plaintexts (where k is the security parameter) in essentially the same time as just 
one plaintext. The ideas behind their protocol seem do not reduce the time requirement 
for proving just one plaintext, nor do they apply to Ring-LWE based encryption schemes. 
In particular, Ring-LWE based schemes are able to encrypt 0(n) plaintext bits into one 
(or two) polynomial, which is often all that is needed. Yet, the techniques in 11811 do not 
seem to be helpful here. The reason is that the challenge matrix required in [18] needs 
to be of a particular form and cannot simply be a ring element in Z q [X]/(X n + 1). 

In this paper, we show that one can reduce the soundness error of lattice-based zero- 
knowledge proofs of knowledge for ciphertext validity from 1/2 to l/(2n), which in 
practice decreases the number of required iterations of the protocol by a factor of ap- 
proximately 10. Interestingly, our techniques only work for ideal lattices, and we do not 
know how to adapt them to general ones. The key observation is that, when working 
over the ring Z[X]/(X n + 1), the quantity 2/{X l — X 7 ) for all 0 < i ^ j < n is a 
polynomial with coefficients in { — 1, 0, 1}, cf. section ( tH 

This immediately allows us to prove that, given A and t, we know a vector of short 
polynomials e such that Ae = 2 1. While this is not quite the same as proving that 
Ae = t, it is good enough for most applications, since it still allows us to prove knowl- 
edge of the plaintext. This result immediately gives improvements in all schemes that 
require such a proof of knowledge for Ring-LWE based encryption schemes such as 
the ring-version of the “dual” encryption scheme (22], the “two element” scheme of 
Lyubashevsky et al. [ 30], and NTRU [24 35]. 


1.2 Linking Lattice-Based and Classical Primitives 

A main step in applying our new proof protocol to construct a “hybrid” group signature 
scheme is to prove that two primitives, one based on classical cryptography and the 
other one on lattices, are committing to the same message (and that the prover knows 
that message). In our application, we will use the perfectly hiding Pedersen commitment 
scheme as the classical primitive, and a Ring-LWE encryption scheme as the lattice- 
based primitive. 

While the Pedersen commitment and the lattice-based encryption scheme work over 
different rings, we show that we can still perform operations “in parallel” on the two. 
For example, if the message is Ho, /ii, . . . , /i n _ i, then it is encrypted in Ring-LWE 
schemes by encrypting the polynomial h = Ho + ii\X + . . . + /i n _iX n_1 , and each 
Hi is committed to individually using a Pedersen commitment. We will then want to 
prove that a Ring-LWE encryption of /i encrypts the same thing as n Pedersen com- 
mitments of the Hi s. Even though the two computations are performed over different 
rings, we show that by mimicking polynomial multiplications over a polynomial ring 
by appropriate additions and multiplications of coefficients in exponents, we can use 
our previously mentioned proof of plaintext knowledge to both prove knowledge of // 
and show that the Pedersen commitments are committing to the coefficients of the same 
/i. One reason enabling such a proof is that the terms dealing with h (and Hi) in the 
proof of knowledge are done “over the integers” — that is, no modular reduction needs 
to be performed on these terms. We describe this protocol in detail in section 0 
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1.3 Applications to Group Signatures and Credentials 

Group signatures 0 are schemes that allow members of a group to sign messages 
on behalf of the group without revealing their identity. In case of a dispute, the group 
manager can lift a signer’s anonymity and reveal his identity. Currently known group 
signatures based on lattice assumptions are mainly proofs of concepts, rather than prac- 
tically useful schemes. The schemes by Gordon et al. 0 and Camenisch et al. 0 
have signature size linear in the number of group members. The scheme due to Laguil- 
laumie et al. 0 performs much better asymptotically with signature sizes logarithmic 
in the number of group members, but, as the authors admit, instantiating it with practical 
parameters would lead to very large keys and signatures. This is in contrast to classical 
number- theoretic solutions, where both the key and the signature size are constant for 
arbitrarily many group members. 

One can argue that the privacy requirement for group signatures is a concern that is 
more long-term than traceability (i.e., unforgeability), because when traceability turns 
out to be broken, verifiers can simply stop accepting signatures for the broken scheme. 
When privacy is broken, however, an adversary can suddenly reveal the signers behind 
all previous signatures. Users may only be willing to use a group signature scheme 
if their anonymity is guaranteed for, say, fifty years in the future. It therefore makes 
sense to provide anonymity under lattice-based assumptions, while this is less crucial 
for traceability. 

Following this observation, we propose a “hybrid” group signature scheme, where 
unforgeability holds under classical assumptions, while privacy is proved under lattice- 
based ones. This allows us to combine the flexible tools that are available in the classical 
framework with the strong privacy guarantees of lattice problems. Our group signature 
scheme has keys and signatures of size logarithmic in the number of group members; 
for practical choices of parameters and realistic numbers of group members, the sizes 
will even be independent of the number of users. Furthermore, by basing our scheme 
on ring-LWE and not standard LWE, we partially solve an open problem stated in 0. 

Our construction follows a variant of a generic approach that we believe is folklore, 
as it underlies several direct constructions in the literature H, 8] and was described 
explicitly by Chase and Lysyanskaya 0. When joining the group, a user obtains a 
certificate from the group manager that is a signature on his identity under the group 
manager’s public key. To sign a message, the user now encrypts his identity under the 
manager’s public encryption key, and then issues a signature proof of knowledge that 
he possesses a valid signature on the encrypted plaintext. Our construction follows a 
variant of this general paradigm, with some modifications to better fit the specifics of 
our proof of plaintext knowledge for lattice encryption. To the best of our knowledge, 
however, the construction was never proved secure, so our proof can be seen as a con- 
tribution of independent interest. 


2 Preliminaries 

In this section, we informally introduce several notions. Formal definitions and proofs 
can be found in the full version. 
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2.1 Notation 

We denote algorithms by sans-serif letters such as A, B. If S is a set, we write s S to 
denote that s was drawn uniformly at random from S. Similarly, we write y A(x) if 
y was computed by a randomized algorithm A on input x , and d ^ D for a probability 
distribution D, if d was drawn according to D. When we make the random coins p of 
A explicit, we write y <r- A (x;p). 

We write Pr[£ : £2] to denote the probability of event £ over the probability space 
£2. For instance, Pr [x = y : x,y D] denotes the probability that x = y if x,y were 
drawn according to a distribution D. 

We identify the vectors (ao, . . . , a n -i) with the polynomial ao + a\X + * • • + 
a n _iX n_1 . If v is a vector, we denote by \\v\\ its Euclidean norm, by |H|oo its infinity 
norm, and by an anti-cyclic shift of a vector v by l positions, corresponding to a 
multiplication by X 1 in R q = 7L q [X\/(X n + 1). That is, = (uo, . . . , v n -i)<z = 

( Vn—U • • • i Vn— 1? • • • •> V n —l — l)- 

Throughout the paper, A denotes the main security parameter and 6 denotes the empty 
string. 

2.2 Commitment Schemes and Pedersen Commitments 

Informally, a commitment scheme is a tuple (CSetup, Commit, COpen), where CSetup 
generates commitment parameters, which are then used to commit to a message m 
using Commit. A commitment cmt can then be verified using COpen. Informally, a 
commitment scheme needs to be binding and hiding. The former means that no cmt 
can be opened to two different messages, while the latter guarantees that cmt does not 
leak any information about the contained m. 

The following commitment scheme was introduced by Pedersen & Let be given 
a family of prime order groups {G(A)}agn such that the discrete logarithm problem is 
hard in G(A) for security parameter A, and let q = q( A) be the order of G = G(A). 

To avoid confusion, all elements with order q are denoted with a tilde in the follow- 
ing. To ease the presentation of our main result, we will write the group G(A) additively. 

CSetup. This algorithm chooses h G, g (h), and outputs cpars = (g,h). 
Commit. To commit to a message m e M = 7L q , this algorithm first chooses r 7L q . 

It then outputs the pair (cmt, o ) = (mg + rh,r). 

COpen. Given a commitment cmt, an opening o , a public key cpars and a message m, 

this algorithm outputs accept if and only if cmt = mg + oh. 

Theorem 2.1. Under the discrete logarithm assumption for G, the given commitment 
scheme is perfectly hiding and computationally binding. 


2.3 Semantically Secure Encryption and NTRU 

A semantically secure (or IND-CPA secure) encryption scheme is a tuple (EncKG, Enc, 
Dec) of algorithms, where EncKG generates public/private key pair, Enc can be used to 
encrypt a message m under the public key, and the message can be recovered from the 
ciphertext by Dec using the secret key. Informally, while Dec(Enc(m)) = m should 
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always hold, only knowing the ciphertext and the public key should not leak any infor- 
mation about the contained message. 

In this paper we present improved zero-knowledge proofs of plaintext knowledge for 
lattice-based encryption schemes, and show how to link messages being encrypted by 
these schemes to Pedersen commitments. Our improved proof of knowledge protocol 
will work for any Ring-LWE based scheme where the basic encryption operation con- 
sists of taking public key polynomial(s) a* and computing the ciphertext(s) bi = ais+ei 
where s and are polynomials with small norms. Examples of such schemes include 
the ring-version of the “dual” encryption scheme [22] , the “two element” scheme of 
Lyubashevsky et al. Si, and the NTRU encryption scheme SI Si. 

In this paper we will for simplicity only be working over the rings R = Z[X\ / ( X n + 
1) and R q = R/qR , for some prime q. Also for simplicity, we will use NTRU as our 
encryption scheme because its ciphertext has only one element and is therefore simpler 
to describe in protocols. The NTRU scheme was first proposed by Hoffstein et al. 
and we will be using a modification of it due to Stehle and Steinfeld 35] . 


npler 

Q, 


Definition 2.2. The discrete normal distribution on Z m centered at v with standard de- 
viation a is defined hy the density function D™ a (x) = pff^ (x) / p a (Z m ), with pff^ (x) = 

. , jji II X — v II 2 

(y=^J e 2^2 being the continuous normal distribution on M m and p a ( Z m ) = 

'fZzez™ Pctai'Z) being the scaling factor required to obtain a probability distribution. 
When v = 0, we also write D™ = Dq 1 (7 . 

We will sometimes write u D V:(T instead of u D™ for a polynomial u G R q if 
there is no risk of confusion. 

In the following, let p be a prime less than q and cr, a G M. 

Message Space. The message space M is any subset of {y e R : | < p}. 
KeyGen. Sample /', g from D a , set / = pf + 1, and resample, if / mod q or g mod q 
are not invertible. Output the public key h = pg/f and the secret key /. Note here 
that h is invertible. 

Encrypt. To encrypt a message m G Ad, set 8, e D a and return the ciphertext 

y = hs + pe + m G R q . 

Decrypt. To decrypt y with secret key /, compute y' = fy G R q and output m! = y' 
mod p. 

If the value of cr is large enough (approximately (D(n^/q)), then g/f is uniformly 
random in R q ill, and the security of the above scheme is based on the Ring-LWE 
problem. For smaller values of cr, however, the scheme is more efficient and can be 
based on the assumption that h = g/f is indistinguishable from uniform. This type of 
assumption, while not based on any worst-case lattice problem, has been around since 
the introduction of the original NTRU scheme over fifteen years ago. Our protocol 
works for either instantiation. 

To obtain group signatures, we will need our encryption scheme to additionally be a 
commitment scheme. In other words, there should not be more than one way to obtain 
the same ciphertext. For the NTRU encryption scheme, this will require that we work 
over a modulus q such that the polynomial X n + 1 splits into two irreducible polyno- 
mials of degree n/ 2, which can be shown to be always the case when n is a power of 2 
and q = 3 mod 8 SS Lemma 3]. 
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Lemma 2.3. Suppose that q = 3 mod 8 and let fiS,fiE, and #.Ad be the domain sizes 
of the parameters s, e, and m in the ciphertext y = hs + pe + m. Additionally suppose 
that for all m E AA, HraUoo < p/2. Then the probability that for a random h, there ex- 
ists a ciphertext that can be obtained in two ways is at most ( 2 #- A/t + 1 ) , ( 2 #^+ 1 )-( 2 #- E + 1 ) > 

Note that the above lemma applies to NTRU public keys h that are uniformly random. 
If h = pg/f is not random, then the ability to come up with two plaintexts for the same 
ciphertext would constitute a distinguisher for the assumed pseudorandomness of h. 

2.4 Rejection Sampling 

For a protocol to be zero-knowledge, the prover’s responses must not depend on its 
secret inputs. However, in our protocols, the prover’s response will be from a discrete 
normal distribution which is shifted depending on the secret key. To correct for this, we 
employ rejection sampling [28, 29], where a potential response is only output with a 
certain probability, and otherwise the protocol is aborted. 

Informally, the following theorem states that for sufficiently large a the rejection 
sampling procedure outputs results that are independent of the secret. The technique 
only requires a constant number of iterations before a value is output, and further- 
more the output is also statistically close for every secret v with norm at most T. For 
concrete parameters we refer to the original work of Lyubashevsky [29l, Theorem 4.6]. 

Theorem 2.4. Let V be a subset of 1} in which all elements have norms less than T, 
and let H be a probability distribution over V. Then, for any constant M, there exists 
a a = 0(T) such that the output distributions of the following algorithms A, F are 
statistically close: 

A : v 4r- H\ z <r- output (z,v) with probability min (D^(z) / (M D^ a (z )) , l) 

F : v H; z output (z,v) with probability 1/M 

The probability that A outputs something is exponentially close to that off, i.e., 1/M. 


2.5 Zero-Knowledge Proofs and 27' -Protocols 

On a high level, a zero-knowledge proof of knowledge (ZKPoK) is a two party protocol 
between a prover and a verifier , which allows the former to convince the latter that it 
knows some secret piece of information, without revealing anything about the secret 
apart from what the claim itself already reveals. For a formal definition we refer to 
Bellare and Goldreich 0]. 

A language C C {0, 1}* has witness relationship R C {0, 1}* x {0, 1}* if x E C 
3(x, w) E R. We call w a witness for xe£. The ZKPoKs constructed in this paper will 
be instantiations of the following definition, which is a straightforward generalization 
of 27-protocols 0B: 

Definition 2.5. Let (P, V) be a two-party protocol, where V is PPT, and let C,C C 
{0, 1}* be languages with witness relations 71,71' such that 7 Z C 72'. Then (P, V) is 
called a 27'-protocol for C , C' with completeness error a, challenge set C, public input 
x and private input w, if and only if it satisfies the following conditions: 
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- Three-move form: The protocol is of the following form: The prover P, on input 
(x, w), computes a commitment t and sends it to V. The verifier V, on input x, 
then draws a challenge c if- C and sends it to P. The prover sends a response s 
to the verifier. Depending on the protocol transcript (t, c, s), the verifier finally 
accepts or rejects the proof The protocol transcript (t, c, s) is called accepting, if 
the verifier accepts the protocol run. 

- Completeness: Whenever (x, w) G 7 Z, the verifier V accepts with probability at 
least 1 — a. 

- Special soundness: There exists a PPT algorithm E ( the knowledge extractor) which 
takes two accepting transcripts (t, c' , s'), (t, c", s") satisfying c' c" as inputs , 
and outputs w' such that (x,w r ) G 7 Z'. 

- Special honest-verifier zero-knowledge (HVZK): There exists a PPT algorithm S 
(the simulator) taking x G C and c G C as inputs, that outputs (t, s) so that the 
triple (t, c, s) is indistinguishable from an accepting protocol transcript generated 
by a real protocol run. 

This definition differs from the standard definition of 17-protocols in two ways. First, 
we allow the honest prover to fail in at most an a-fraction of all protocol runs, whereas 
the standard definition requires perfect completeness, i.e., a = 0. However, this relax- 
ation is crucial in our construction that is based on rejection sampling 1281 |29[], where 
the honest prover sometimes has to abort the protocol to achieve zero-knowledge. Sec- 
ond, we introduce a second language C r with witness relation 7 Z' D 7 Z, such that pro vers 
knowing a witness in 7 Z are guaranteed privacy, but the verifier is only ensured that the 
prover knows a witness for TZ' . This has already been used in 0]] and informally also 
in, e.g., 1 16, 19]. If the soundness gap between 7Z and VJ is sufficiently small, the im- 
plied security guarantees are often enough for higher-level applications. Note that the 
original definition of 27-protocols is the special case that a = 0 and R = R' . 

We want to stress that previous results showing that a 27-protocol is always also an 
honest- verifier ZKPoK with knowledge error 1/|C| directly carry over to the modified 
definition whenever 1 — a > l/\C\. Zero-knowledge against arbitrary verifiers can be 
achieved by applying standard techniques such as Damgard et al. M14lfl7n . 

Finally, it is a well known result that negligible knowledge and completeness errors 
in A can be achieved, e.g., by running the protocol A times in parallel and accepting if 
and only if at least A(1 — ol)/2 transcripts were valid, if there exists a constant c such 
that (1 — ol)/2 > 1/|C| + c . 

Some of the 27' -protocols presented in this paper will further satisfy the following 
useful properties: 

- Quasi-unique responses : No PPT adversary A can output (y, t, c, s, s') with s/s' 
such that \/(y, t, c, s) = \/(y, t, c, s') = accept. 

- High-entropy commitments : For all (y, w) eTZ and for all t, the probability that an 
honestly generated commitment by P takes on the value t is negligible. 


3 Proving Knowledge of Ring-LWE Secrets 

In the following we show how to efficiently prove knowledge of short 2s, 2 e such that 
2 y = 2 as + 2e. This basic protocol can easily be adapted for proving more complex 
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relations including more than one public image or more than two secret witnesses. Be- 
fore presenting the protocol, we prove a technical lemma that is at the heart of the 
knowledge extractor thereof. 

3.1 A Technical Lemma 

The following lemma guarantees that certain binomials in Z[X]/(X n + 1) can be in- 
verted, and their inverses have only small coefficients. 

Lemma 3.1. Let n be a power of 2 and let 0 <i,j < 2n — l.Then2{X l — X^)~ 1 mod 
(. X n + 1) only has coefficients in {— 1, 0, 1}. 

Proof. Without loss of generality, assume that j > i. Using that X n = — 1 mod ( X n + 
1), we have that 2(X Z — X 7 ) -1 = — 2X n ~ l (l — X 7 ’ - *) -1 . It is therefore sufficient to 
prove the claim for i = 0 only. 

Now remark that, for every k > 1 it holds that: (1 - X?)(l + X* + X 2 - 3 + . . . + 
X{k~ 1 )j\ = l _ X kj . 

Let us write j = j", with j" a positive odd integer and 0 < j' < log 2 (n), and let 

us choose k = 2 log2 ( n ) - - 7 (recall that n is a power of 2). We then have jk = nj", and 
X kj = (— 1 y" = — 1 mod (X n + 1), hence 1 — X k i = 2 mod (X n + 1). Therefore, 
we have 

2(1 - X 3 ')” 1 = 1 + X j +X 2j + ...+ X (fc “ 1)3 mod ( X n + 1) 

= 1 =b X- 7 mod n ± X 2 - 7 mod 71 =b . . . ± X^ -1 ^ mod n mod (X n + 1) . 

Finally, in this equation, no two exponents are equal, since otherwise that would mean 
that n divides jk' with 1 < k' < k, which is impossible by definition of k. □ 


3.2 The Protocol 

We next present our basic protocol. Let therefore be y = as + e, where the LWE-secrets 
D ry a re chosen from a discrete Gaussian distribution with standard deviation a. 
Protocol l3.2l now allows a prover to convince a verifier that it knows s' and e! such that 
2 y = 2 as' + 2e' with 2 s' and 2e' being short (after reduction modulo q ), i.e., the verifier 
is ensured that the prover knows short secrets for twice the public input. Here, by short 
we mean the following: An honest prover will always be able to convince the verifier 
whenever ||s|| , ||e|| < 0(y/na), which is the case with overwhelming probability if 
they were generated honestly. On the other hand, the verifier is guaranteed that the 
prover knows LWE-secrets with norm at most (D(n 2 a). This soundness gap on the size 
of the witnesses is akin to those in, e.g., QQ. 

To be able to simulate aborts when proving the zero-knowledge property of the pro- 
tocol, we must not send the prover ’s first message in the plain, but commit to it and 
later open it in the last round of the X' -protocol. We therefore make use of an auxiliary 
commitment scheme (aCSetup, a Comm it, a CO pen), and assume that honestly gener- 
ated commitment parameters are given as common input to both parties. We do not 
make any assumptions on the auxiliary commitment scheme. However, if it is computa- 
tionally binding, the resulting protocol is only sound under the respective assumption, 
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Prover 




Verifier 

r s ,r e 4 D d(V ^ a) 





t — ar s + r e 





(caux,d au x) = aCommit(f) 


Caux 





C 

c 4 c = {0, . 

. . , 2n — 1} 

Ss — r s X c s 





s e — r e T X c e 





accept with probability 

D 0(^a)(( Se ’ S 

(X c e,X c s),6(V2^cx 

l, s)) 

)((s e ,s 

s)) 



ti dauxi (s s 

5 s e) 





X c y + t 

= CLS s + S e 




a CO pen (t, c a ux 5 d aU x 

) = accept 




|s s | , || S e 

< 0(na) 


Protocol 3.2. Proof of knowledge of LWE-secrets s, e such that y — as + e 


and similarly if it is computationally hiding. For simplicity, the reader may just think of 
the scheme as a random oracle. 

Theorem 3.3. Protocol ^. A is an HVZK X' -protocol for the following relations: 

K = {((a,y),(s,e)):y = as + e A \\s\\ ,\\e\\ < 6(Vna)} 

= I((a, 2/), (s,e)) : 2y = 2as + 2e A |]2s|| , ||2e|| < &(n 2 a)} 


where 2s and 2e are reduced modulo q. The protocol has a knowledge error of 1/ (2 n), 
a completeness error of 1 — 1/M, and high-entropy commitments. 

We remark that in Protocol O the rejection sampling is applied on the whole vector 
(s e , s s ) instead of applying it twice on s e and on s s . This yields better parameters (M 
or the “a = Q(T)” in Theorem l2.4t by a factor of about y/2, because of the use of the 
Euclidean norm. 


Proof We need to prove the properties fr om d efinition 12. 5 . 

Completeness. First note that by theorem [2~4l the prover will respond with probability 
1 /M. If the prover does not abort, we have that: 


as s + s e = a(r s + X c s) + (r e + X c e) = X c (as + e) + (ar s + r e ) = X c y + t . 


For the norms we have that l|s s || < ||r s || + || s || < 0(na) with overwhelming proba- 
bility, as the standard deviations of r s is 0(y/na) 9 and similarly for s e . 

Honest-verifier zero-knowledge. Given a challenge value c, the simulator outputs the tu- 
ple (a Com m it (0), c, _L) with probability 1 — 1/M. With probability 1/M, the simulator 
S proceeds as follows: It chooses s s , s e Do(^ a y and computes t = as s + s e — 
X c y , and (c aux , d^f) aCommit(t). Finally, S outputs ( Caux 5 ^5(^5 <^aux, (B.,S e ))). 

It follows from theorem 12/3 that if no abort occurs the distribution of s e , s s does not 
depend on 5 , e, and thus simulated and real protocol transcripts are indistinguishable. 
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In case that an abort occurs, the indistinguishability follows from the hiding property of 
a Comm it and the fact that aborts are equally likely for every c. 

Special soundness. Assume that we are given (c aux , c 7 , (V, d' mx , (s' sl Sg))) and (c aux , c 77 , 
(s' 7 , Sg 7 ))) passing the checks performed by the verifier. From the binding 
property of the auxiliary commitment scheme we get that t 7 = t" =: t. Now, by 
subtracting the verification equations we get: (X c — X c )y = a( s' — s 77 ) + (s' — s 77 ) . 
Multiplying by 2(X C ' - X 0 ")- 1 yields: 


2 y = a 


2 K - s'') 

X c' _ X c" 


+ 


2( s e ~ S") 


2as 2e . 


Furthermore, we get that ||2s|| < ||Sg — s"|] yjn 
second inequality we used lemma luii and similarly for 


X c '-X c ' 


< (D(n 2 a ), where in the 


High-entropy commitments. This directly follows from the security of the auxiliary 
commitment scheme. □ 


By section I2.5L both the completeness and the knowledge error can be made negligible 

if n > M 2 . 


4 Proving Equality among Classical and Lattice-Based Primitives 


In the following we show how our basic protocol from section 0 can be used to link 
number- theory and lattice-based primitives via zero-knowledge proofs of knowledge. 
We exemplify this by showing how to prove that the messages contained in Pedersen 
commitments correspond to the plaintext in an encryption under the secure version 
of NTRU. We want to stress that in particular the choice of the encryption scheme 
is arbitrary, and it is easy to exchange it against other schemes, including standard 
NTRU [24]] or Ring-LWE encryption [ 30]. 

Let y = hs -j- pe -j- m G R q be the NTRU encryption of a message m G {0, l} n , and 
let p > 2 n 2 be coprime with q. Let further g, h be a Pedersen commitment parameters, 
cf. section 12.21 and let cmti = rriig + Vih for i = 0, . . . , n — 1 be commitments to 
coefficients of m, wh ere the order of g and h is q > 2 n 2 . 

Then Protocol k.ll can be used to prove, in zero-knowledge, that the commitments 
and the ciphertext are broadly well-formed and consistent, i.e., contain the same mes- 
sage. More precisely, the protocol guarantees the verifier that the prover knows the 
plaintext encrypted in 2 y, and that the coefficients of the respective message are all 
smaller than p. Furthermore, it shows that the messages are the same that are contained 
in 2 cmti, i*e., 2 y and the 2 cmti are consistent. 
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Protocol 4.1. Proof that Pedersen commitments and NTRU encryption contain the same Plain 
Text 


Theorem 4.2. Protocol s. l\ is an HVZK U' -protocol for the following relations: 
ft = | (( 9 ,h,(cmti)V-~^,h,p,y),(m,s,e,(ri)^)) : y = hs + pe + m 

n — 1 n 

A f\ cmti = m,g + nh A Umll^ < 1 A ||s|| , ||e|| < 0(^/na) i , 


i = 0 


TV = \ {{ 9 ^K(cmti)^ = f,h,p,y),(m,s,e,(ri)^ = f)) : 2y = 2hs + 2pe + 2m 


n — 1 


A ^ 2 cmti = (2m mod q)ig + 2rih 


i=0 


A <2n 2 A ||2s|| , ||2e|| < 0(n 2 a) 


where (2m mod A i-coefficient of 2m G The protocol has a knowledge 
error of 1/ (2n), tmd a completeness error of 1 — 1/M. 

Furthermore, if for the auxiliary commitment scheme a commitment does not only 
bind the user to the message, but also to the opening information, the protocol has 
quasi-unique responses and high- entropy commitments. 


A detailed proof is given in the full version. By the remark in section I2.5L both the 
completeness and the knowledge error can be made negligible if n > M. 
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5 Application to Group Signatures 

We next show how Protocol l4.ll can be used to construct a group signature scheme 
with signature size logarithmic in the number of group members. The scheme is private 
under lattice assumptions, but traceable/unforgeable under non-lattice assumptions. As 
argued in the introduction, this may be realistic in applications where privacy needs to 
be guaranteed on the long term. For example, if group signatures are used to sign votes 
in electronic elections, unforgeability is mainly important when the votes are counted, 
but privacy needs to be preserved long after that. 

Before presenting the actual signature scheme, we will prove secure a variation of a 
generic construction that we believe is folklore, as it underlies several direct schemes in 
the literature 1 3, 8] and was explicitly described by Chase and Lysyanskaya O. The 
resulting construction satisfies the following definition of group signatures providing 
full (CCA) anonymity put forth by Bellare et al. [5]. 

Definition 5.1. A group signature scheme is a tuple (GKG, GSign, GVerify, GOpen) 
where: 


- On input 1 A , 1^, the key generation algorithm GKG outputs a group public key 
gpk, an opening key gok, and a vector of N signing keys gsk where gsk[z] is given 
to user i G {1, . . . , N}. 

- On input gsk = gsk[z] and message m G A4, the signing algorithm GSign outputs 
a group signature a. 

- On input gpk , ra, a, the verification algorithm GVerify outputs accept or re j ect. 

- On input gok , ra, a, the opening algorithm GOpen outputs the identity of the pur- 
ported signer i G {1, . . . , N} or JL to indicate failure. 

The algorithms satisfy the following properties: 


- Correctness: Verification accepts whenever keys and signatures are honestly gen- 
erated, i.e.,for all A, N G N, all i G {1, . . . , N}, and all m G M 


GVerify(gpA;, ra,cr) = accept : 

(gpk, gok , gsk) GKG(1 a , 1^), a ^ GSign(gsk[z], m) 


- Anonymity: One cannot tell which signer generated a particular signature, even 
when given access to an opening oracle. Referring to Figured for all PPT A there 
exists a negligible function negl such that 

|Pr[Exp^ non -°(A) = 1] -PrfExp^-^A) = 1]| < negl(A) . 


- Traceability: One cannot generate a signature that cannot be opened or that opens 
to an honest user. Referring to Figure Q for all PPT A there exists a negligible 
function negl such that 


Pr[Exp^ race (A)] < negl(A) . 
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Experiment Exp^ non h ( A) : 

( gpk , gok , gsk) GKG(1 A , 1^) 

A GOpen (g° k ’'’'\(gpk, gsk), e) 
<7* GSign(gsk[ifc], m*) 

6' A GOpen {gok,;-)( a *, st ) 

If (m*,<7*) £ Qcopen 
then return b' else return 0 


Experiment ExpA ace (A): 

(gpk, gok , gsk) GKG(1 A , 1^) 

(m, a) 4^- 

A GSign(gsk[.],.),gsk[.](^ fcj ^ ofc ) 
i GOpen (gok, m, a) 

If GVerify(gp£;, m,a) = 1 A i 0 Q gs k 
A (z, ?7z) 0 QGSign 
then return 1 else return 0 


Fig. 1 . The anonymity (left) and traceability (right) experiments for group signatures. The sets 
QGOpen, SGSign, Qgsk contain all queries (m,a), (i, m), and i that A submitted to its GOpen, 
GSign, and gsk oracles, respectively. 


5.1 Building Blocks 

The construction is based on weakly unforgeable standard signatures, and signature 
proofs of knowledge. In the following, we recap the respective definitions. 

Informally, a signature scheme is a triple (SKG, SSign, SVerify), where SKG gener- 
ates a signing/verification key pair (ssk, spk ), SSign can be used to sign a message m 
using the signing key, and SVerify can be used to check the validity of a signature only 
using the public verification key. It should hold that honestly computed signatures are 
always valid, and that no adversary can come up with a valid signature on a new mes- 
sage after having received signatures on messages that he chose before obtaining spk. 
A formal definition can be found in the full version. 

Concerning signature proofs of knowledge, we adapt the definitions of Chase and 
Lysyanskaya [11] to allow for signatures in the random-oracle model (ROM) that are 
simulated by programming the random oracle H and extracted through rewinding. We 
also generalize the definition to allow for a soundness gap: signing is performed using 
a witness from R for a language £, while extraction only guarantees that the signer 
knows a witness from R' D R for relation CJ . Finally, we add a definition of simu- 
lation soundness, meaning that an adversary cannot produce new signatures for false 
statements even after seeing simulated signatures on arbitrary statements. 

Definition 5.2. A signature of knowledge scheme for languages C, C! with respective 
witness relations R, R' is a tuple (SoKSetup, SoKSign, SoKVerify, SoKSim) where: 

- On input 1 A , the setup algorithm SoKSetup outputs common parameters sokp. 

- On input sokp,x,w such that (x,w) G R and message m G M, the signing 
algorithm SoKSign outputs a signature of knowledge sok. 

- On input sokp, x, m, sok, the verification algorithm SoKVerify outputs accept or 
rej ect. 

- The stateful simulation algorithm SoKSim can be called in three modes. When 

called as ( sokp,st ) SoKSim(setup, 1 A , e), it produces simulated parameters 

sokp, possibly keeping a trapdoor in its internal state st. When run as (h, st ') G^- 
SoKSim(ro, Q, st), it produces a response hfor a random oracle query Q. When 
run as (sok, st') SoKSim(sign, x, m, st), it produces a simulated signature of 
knowledge sok without using a witness. 
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For ease of notation, let StpSim(l A ) be the algorithm that returns the first part of 
SoKSim (setup, 1 A , st), let ROSim((3) be the algorithm that returns the first part 
of SoKSim(ro, Q, st), let SSim (x,w,m) be the algorithm that returns the first 
part of SoKSim (sign, x,m, st) if (x,w) G R and returns _L otherwise, and let 
SSim^x, m) be the algorithm that returns the first part 6>/SoKSim(sign, x, m, si) 
without checking language membership. The experiment keeps a single synchro- 
nized state for SoKSim across all invocations of these derived algorithms. 

The algorithms satisfy the following properties: 


- Correctness: Verification accepts whenever parameters and signatures are cor- 
rectly generated, i.e., for all A G N, all (x, w) G R, and all m G A4, there exists a 
negligible function negl such that 


Pr 


SoKVerify(so£;p, x, m, sok) = re j ect : 
sokp G 1 SoKSetup(l A ), sok G^- SoKSign(so£;p, x, w, m) 


< negl(A) . 


- Simulatability: No adversary can distinguish whether it is interacting with a real 
random oracle and signing oracle, or with their simulated versions. Formally, for 
all PPT A there exists a negligible function negl such that 


Pr [b = 1 : sokp X SoKSetup(l x ),b X A ffO.SoKSign^ofcp,.,.,-) (sokp)] 


- Pr [b = 1 : sokp # StpSim(l A ), b A R0Sim (')’ SSim (-,;■) ( so kp)] < negl(A) . 


- Extractability: The only way to produce a valid signature of knowledge is by know- 
ing a witness from R'. Formally, for all PPT A there exists an extractor SoKExtA 
and a negligible function negl such that 


Pr 


SoKVerdy (sokp,x, m, sok) = accept 
/\(x, w,m) ^ Q A (x, w)£R' : 
sokp StpSim(l A ; ps), 

(x, m, sok) G 2 - A ROSim ^)’ SSim ^'’'’')(5oA;p; pjf), 
w G^- SoKExtA (sokp, x, m, sok , ps, Pa) 


< negl(A) , 


where Q is the set of queries (x, re, m) that A submitted to its SSim oracle. 

- Simulation-soundness: No adversary can produce a new signature on a false state- 
ment for C , even after seeing a signature on an arbitrary statement. Formally, for 
all PPT A there exists a negligible function negl such that 


SoKVerify (sokp, x, m, sok) = accept 
A (x f , m! , sok') (x, m , sok) A x 0 C! : 
sokp G 1 StpSim(l A ), (x, m , st) /\ R0S ' m k) (sokp), 
sok g^- SSim^x, m), (x r , m! , sok') G^- A ROSim (')(soA;, st) 


< negl(A) . 
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5.2 Generic Construction 

A folklore construction of group signatures is to have a user’s signing key be a standard 
signature on his identity z, and to have a group signature on message m be an encryption 
of his identity together with a signature of knowledge on m that the encrypted identity 
is equal to the identity in his signing key. The construction appeared implicitly 0 ! or 
explicitly ill in the literature, but was never proved secure. 

To obtain full anonymity, this generic construction would probably require CCA se- 
curity from encryption scheme, but our NTRU variant is only semantically secure. We 
could apply a generic CCA-yielding transformation using random oracles or 
non-interactive zero-knowledge proofs of knowledge (NIZK), but this would make the 
signature of knowledge hopelessly inefficient. Instead, we take inspiration from the 
Naor-Yung construction |3ll |33|] by using a semantically secure scheme to encrypt 
the user’s identity twice under two different public keys and letting the signature of 
knowledge prove that both ciphertexts encrypt the same plaintext. Moreover, our proof 
systems have a soundness gap: the adversary for the soundness game may use more 
noise in the ciphertexts than what the encryption algorithm Enc does, and may also en- 
crypt plaintexts outside {0, l} n . We therefore give a generic construction that deviates 
slightly from the general idea, but that is sufficient and that we can efficiently instantiate 
with our protocol from section 0. 

Let (EncKG, Enc, Dec) be an encryption scheme with message space XV, let XV' D 
XV , and let Enc' be an algorithm such that for all key pairs ( epk , esk ) EncKG(l A ), 
for all z G XV and for all random tapefl p, p' and all z G XV , i' G XV ' : 

Enc(ep£;, z; p) = Enc ' (epk, z; p) and Dec(es£;, Enc '(epk, i '; p')) = i' . 

The algorithm Enc' represents the way the adversary can generate the ciphertexts and 
still prove them valid. The above property ensures that completeness holds perfectly 
even with Enc . The IND-CPA property still has to hold with Enc. 

For our instantiation with the NTRU encryption scheme from Theorem \a~^L XV = 
{0, 1 Y which is identified with {0, . . . , 2^ — 1} (with £ < n, q ), XV' = Z q and p = 
(s, e). The algorithm Enc / ((/z,p), z'; p') with i' G XV' checks that either p' is a triple 
(s, e, i"), or i' G XV and p' is a pair of vectors (s, e). In the latter case, i" is just the 
binary vector in {0, 1} £ corresponding to i ' . In both cases, 5 and e must be such that 
||2s || , ||2e|| < &(n 2 * * a), i" € R q , 2 i’ = ^"=o 2- 7 (2i" mod q)j mod q, and ||*"|| 00 < 
2rz 2 < p, q. If all these requirements are met, Enc outputs y hs +pe + i'. We need to 
slightly change the algorithms EncKG and Enc to truncate the distribution of g, s and e, 
to ensure that ||s|| , ||e|| < O(y/no) and ||p|| is small enough for the decryption below. 
We also change the algorithm Dec: to decrypt C = y with secret key /, it computes 
y' = 2 fy G R q , and outputs i' = (XljJo 1 ^ 0/ niodp)j)/2 mod q. In other words, 
it decrypts 2 C = 2 y into y' mod p , and then recover the corresponding identity in 
XV' = Zq. This does not touch security. 

1 To simplify notation in this section, we assume that the random tapes p, p are not necessarily 

a uniform binary bitstrings as usual. Rather, we see p as the list of random values that Enc 

directly derives from the random tape, while p can be seen as an auxiliary adversarial input to 

the Enc' algorithm. 
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Let (SKG, SSign, SVerify) be a signature scheme and let (SoKSetup, SoKSign, 
SoKVerify, SoKSim) be a signature of knowledge scheme for the languages C, CJ with 
witness relationships 

R = {(( spk , epk x , epk 2 , C\, C 2 ), (z, szg, pi, p 2 )) • SVerify(spA;, z, szp) = accept 

A Ci = Enc (epk x , i; pi) A C 2 = Enc(epA; 2 , z; p 2 )} , 
i?' = {((spA;, epA; 2 , Ci, C 2 ), (i r , szp, p}, p 2 )) : SVerify(spA;, %' , szp) = accept 

A Ci = Enc^epA^, z'; p}) A C 2 = Enc'(epA; 2 , z'; p 2 )}. 

Consider the following group signature scheme with user identities z G IV: 

- GKG(1 a , 1 n ): The group manager generates signing keys (spk, ssk ) Y- SKG(1 A ), 
encryption keys (epk 1 , esk\) Y- EncKG(l A ), (epk 2 , esA; 2 ) Y- EncKG(l A ), and 
parameters sokp Y- SoKSetup(l A ). He computes gsk[z] Y- SSign (ssk, i) for z G 
XV and outputs gpk = (spk, epk x , epk 2 , sokp), gok = (gpk, esk\), and gsk. 

- Q\S\gn(gsk,m): Signer z computes two ciphertexts Ci <— Enc(epk 1 , z; pi) and 
C 2 A- Enc(epA; 2 , z; p 2 ), computes a signature of knowledge soA; Y- SoKSign( 
sokp, (spk, epk x , epk 2 , C\, C 2 ), (z, sig, pi, p 2 ), m) and outputs the group signa- 
ture cr = (Ci, C 2 , soA;). 

- GVerify(ppA;, m, cr): To verify a group signature, one checks that SoKVerify (sokp, 
(spk, epk 1 , epk 2 , C\, C 2 ), m, sok ) = accept. 

- GOper\(gok, m, cr): The opener checks that GVerify(ppA;, m, a) = accept, and 
returns z <— Dec(esA;i, Ci). 

Theorem 5.3. The group signature scheme sketched above is anonymous in the ROM 
if the encryption scheme is semantically secure and the signature of knowledge scheme 
is simulatable and simulation- sound. 

Theorem 5.4. The group signature scheme is traceable in the ROM if the underlying 
signature scheme is weakly unforgeable and the signature of knowledge scheme is sim- 
ulatable and extractable. 

The proofs of the last two theorems are omitted here and are given in the full version. 

5.3 Signatures of Knowledge from ^'-Protocols 

We now show a construction of the required signatures of knowledge in the random- 
oracle model from a signature scheme and an encryption scheme with ^'-protocol 
proofs. More particularly, we require that for the signature scheme one can prove knowl- 
edge of a signature on a committed message, while for the encryption scheme one can 
prove that an encrypted plaintext is equal to a committed message. 

Let (CSetup, Commit, COpen) be a commitment scheme, let (EncKG, Enc, Dec) be 
an encryption scheme with message space M and let Enc 7 be an associated algorithm 
as described earlier. Let (P s , V s , S s ) be a 17-protocol for the language C s with 

R s = {((spk, cpars, cmt), (sig, m, 0 )) : 

SVerify(spA;, m, sig) = accept A COpen (cpars, m, cmt, o) = accept} . 
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Let also (P e , V e , S e ) be a ^'-protocol for the languages C e , £' e with 

Re = {((epk, C, cpars , cmt ), (m, p , o)) : 

C = Er\c(epk, m; p) A COpen (cpars, m , cmt, o) = accept} , 

R'e = {(( e P ^5 cpars , cmt), (m, //, o)) : 

C = Enc^epfc, m; p') A COpen(cpars, m, cmt, 0 ) = accept} . 

Let C s and C e be the challenge spaces for these respective protocols, and let H : 
{0,1}* C s x C e . Consider the following construction of a signature of knowledge 
scheme for the languages C and C'\ 

- SoKSetup(l A ): Return sokp = cpars CSetup(l A ). 

- SoKSign(so£;p, x, w, m): Create a commitment (cmt, 0 ) ^ Commit(cpars, m). 

Compute the first round of the ^'-protocols for a signature and two encryptions 
(t s , st s ) P s ((spk, cpars , cmt), ( sig , m, o)) and (tj, stj) 4^- P e ((epA; J -, Cj, 

cpars , cmt), (m, pj, 0 )) for j = 1, 2. Generate the challenges (c s , c e ) H(spk , 
cpars, cmt , epk x , C\, epk 2 , C 2 , t s , ti, t 2 , m). Compute responses s s P s (c s , 
st s ) and Sj A- P e (c e , stj) for j = 1,2 and output the signature of knowledge 
S0& = (t s , t]_, t2, S s , S]_, S2)* 

- SoKVerify(so£;p, x, m, soA;): Recompute the challenges (c s , c e ) H(spk , cpars, 
cmt, epk 1 , C\, epk 2 , C 2 , t s , ti, t 2 , m). Return accept if V s ((spk, cpars, cmt), 
t s , c s , s s ) = accept and \/ e ((epkj, Cj, cpars, cmt), tj, c e , Sj) = accept for 
j = 1, 2. Otherwise, return reject. 

- SoKSim: The simulation algorithm keeps in its state its random tape, an initially 

empty table HT to keep track of previous random-oracle queries, and a counter ctr 
initialized to zero. The simulator’s random tape p includes random-oracle responses 
hi, ... , h q h+qs x C e , where qn and qs are upper bounds on the number of 

random-oracle and signing queries that an adversary can make. When called as 
SoKSim(setup, 1 A , e), it generates commitment parameters cpars ^ CSetup(l A ) 
and returns (cpars, st = (p, HT, ctr, cpars)). When run as SoKSim(ro, Q,st), 
it checks whether the query Q was made before. If so, it returns h HT [ q ]- Other- 
wise, it increases the counter ctr , sets HT[Q] <— ctr , and returns h ctr - When 
run as SoKSim(sign, (spk, epk x , epk 2 , C\, C 2 ), m, st), the simulator first creates 
a commitment (cmt, o) 4 $- Commit(l, cpars). It then increases the counter ctr 
and parses h ctr as (c s , c e ). It runs the simulators S s , S e to obtain simulated proto- 
col transcripts (t s , s s ) S s ((spk, cpars, cmt), c s ) and (tj, Sj) 4 $- S s ((epkj,Cj, 
cpars, cmt), c e ) for j = 1,2. If HT[spk, cpars, cmt, epk 1 , C\, epk 2 , C 2 , t s , ti, 
t 2 , m] is not defined, then set it to h ctr , else abort. 

Theorem 5.5. The above scheme is correct if the proof systems (P s , V s ) and (P e , V e ) 
have negligible completeness error. 

Theorem 5.6. The above scheme is simulatable in the random-oracle model if the com- 
mitment scheme is hiding and the proof systems (P s , V s ) and (P e , V e ) are special HVZK 
and have high-entropy commitments. 
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Theorem 5.7. The above scheme is extractable in the random- oracle model if the com- 
mitment scheme is binding and the proof systems (P s , V s ) and (P e , V e ) are special- 
sound and have super-polynomial challenge spaces and negligible knowledge error. 

Theorem 5.8. The above scheme is simulation- sound if the underlying commitment 
scheme is binding and the underlying £' -protocols (P s ,V s ,S s ) and (P e ,V e ,S e ) are 
special- sound, have quasi-unique responses, super-polynomial challenge spaces, and 
negligible knowledge error. 

Due to length limitations, the proofs of the previous theorems can be found in the full 
version. 


5.4 ^'-Protocols for Boneh-Boyen Signatures and the Group Signature Scheme 

In the following we briefly recap the weakly unforgeable version of the Boneh-Boyen 
signature scheme [6, 7]. We assume that the reader is familiar with bilinear pairings 
and the strong Diffie-Hellman (SDH) as sumption. The Boneh-Boyen signature scheme 
is defined as follows for a bilinear group generator BGGen: 

SKG. This algorithm first computes (g, Gi, G 2 , Gt, e) BGGen(l A ). It chooses 
gi 4 1 - G*, <72 G 2 , x Z~, and defines v = xg2 and 5 = e(gi,g2)- It 
outputs spk = ((g, Gi, G2, G t, e), gi, #2, v, z) and ssk = x. 

SSign. To sign a message m G Zq\ {—ssk} with secret key ssk = x, this algorithm 
outputs the signature sig = yrsp^gi if x + m 0, and 0 otherwise. 

S Verify. Given a signature public key spk , a message m G Zq and a signature sig , this 
algorithm outputs accept if v + m^ = 0 in case sig = 0, and if e(sig, v+mgf) = 
5 in case sig 0. In all other cases, it outputs reject. 

Lemma 5.9. If the SDH assumption holds for BGGen, then the above scheme is a 
weakly unforgeable signature scheme. 


We next show how a user can prove possession of a Boneh-Boyen signature on a 
message m, while keeping both, the message and the signature, private. In addition, the 
proof will additionally show that the m is also contained in a set of P edersen commit- 
ments cmti = rriig + rji such th at m = ^ m i * c ^- section 12^2. 

The idea underlying Protocol 5.10l is similar to that in Camenisch et al. [9] : The 
prover first re-randomizes the signature to obtain a value s , which it sends to the verifier. 
Subsequently, the prover and the verifier run a standard Schnorr proof for the resulting 
statement. 


Theorem 5.11. Protocol 5. id is a perfectly HVZK U -proof of knowledge for the fol- 
lowing relation: 


n — 1 


TZ = < ((spk, ( cmti )r=o 1 )> (sig, m, r, (m*, r,)”^ 1 )) : m = ^ 2 ' l m i A 


i=0 


cmti = mig + rji A SVerify(5pA;, m, sig) = accept 
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Protocol 5.10. Proof of possession of a signature on m , which is also contained in a set of 
Pedersen commitments 


The protocol is perfectly complete, and has a knowledge error ofl/q. Furthermore, the 
protocol has quasi unique responses ( under the discrete logarithm assumption in G) 
and high-entropy commitments. 


The proof of this theorem is straightforward and can be found in the full version. 


The Group Signature Scheme. Combining Pro toco ls U. ll and l5T0l now directly gives a 
group signature by the construction from section l53i The TV is given by {0, 1} £ (which 
can be identified with {0, . . . , 2 £ — 1}), where t < n and n/q is negligible , n is the 
dimension of the ring being used, and q is the order of the groups of the commitment- 
and the signature schemes. The condition n/q is just to ensure that with overwhelming 
probability, ssk £ TV, so that all signa tures of an identity i G TV is non-zero and 
can be used as a witn ess in Protocol 5. 10l The commitment (CSetup, Commit, COpen) 
scheme from section D.3L corresponds to the bit-by-bit Pedersen commitments cmti. 
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